Koca Ventures Ltd
71-75 Shelton Street
Covent Garden, London
WC2H 9JQ, United Kingdom
Registered in England & Wales16231043

ON-PREM PHONE VOICE AGENTS

An AI voice agent that answers your phones —running on your own hardware.

AI voice agents that answer business calls — booking, routing, FAQs, callbacks — with every second of audio staying on your own network. We're honest about the limits: natural turn-taking, not a human impersonation.

Sample inbound booking call — audio + live transcript and latency readout (coming soon)
WHO IT'S FOR

Where on-prem voice earns its place

01

Clinics & regulated practices

After-hours booking, insurance and hours questions, and triage-then-transfer. The cleanest fit when patient data can't sit on a multi-tenant cloud.

02

Dealerships & dealer groups

Service-appointment booking, multi-location call routing, and lead capture when every line is busy. The agent handles the routine ask and routes the rest to the right desk.

03

Restaurants & hospitality

Reservations, party-size and availability, menu questions, and confirmations — so the phone stops pulling staff off the floor during a rush.

04

After-hours & overflow reception

Most missed calls never ring back — and a large share of inbound arrives after hours or when every line is busy. The agent covers overflow, books or takes a message, and escalates anything real.

WHAT IT DOES

The routine calls, handled — the hard ones, handed over

Booking and rescheduling, call routing, the questions you're asked all day, callbacks, and the hours your team can't cover. The agent handles the routine majority of calls end to end and transfers anything real to a person, with the context already gathered.

THE ON-PREM STACK

Every layer runs on hardware you control

01

Speech-to-text (local)

faster-whisper as the workhorse — robust across languages including Turkish, running on your own GPU. whisper.cpp is the CPU fallback where there's no NVIDIA card.

02

Turn-taking & barge-in (local)

Silero VAD for speech detection plus LiveKit's semantic turn-detector, so the agent knows that “I need to think about that…” isn't the end of a turn. Runs on CPU and covers Turkish.

03

The dialogue model (local)

A self-hosted LLM (Qwen3 or a Llama-class 8B) served by vLLM for concurrent callers, or Ollama for the simple single-line case — kept in VRAM so the agentic loop stays fast.

04

Text-to-speech (local)

Kokoro or Piper for commercial-safe local voices; XTTS-v2 for cloned voices where a licence allows it. This is the layer with the honest caveat below.

05

Telephony & the SIP bridge

A real phone number via a SIP trunk (Twilio or Telnyx) bridged into a self-hosted media server — Asterisk, FreeSWITCH, or LiveKit SIP. Your existing PBX stays; only the lines you choose route to the agent.

06

Orchestration

LiveKit Agents or Pipecat ties the pipeline together — streaming every stage and handling interruptions. We run it on your machines or, as a managed option, on our own two-node GPU cluster.

HONEST LIMITS

What on-prem voice can't do (yet) — said plainly

01

Latency is ~0.5–1.2 seconds, not human-equal

A well-tuned local stack lands around half a second to just over a second, end to end. That's natural, interruptible turn-taking — but a human leaves roughly a 200ms gap, and even the fastest cloud speech is around 0.8–1.1s. We won't tell you it's indistinguishable from a person, because it isn't.

02

Turkish text-to-speech is the weak link

The commercial-safe local Turkish voice (Piper) is more robotic; the most natural one (XTTS-v2) carries a non-commercial licence that needs a separate agreement before production use. Turkish speech-to-text and turn-taking are solid — but on TTS naturalness, we set expectations honestly rather than over-sell.

03

On-prem trades cloud uptime for data control

Cloud platforms give you 99.9%+ geo-redundant uptime out of the box. Running on your own hardware means a failure on your premises is a real event that needs handling — which is why on-prem comes with a maintenance and monitoring retainer, and a hybrid failover path where it's warranted.

04

Sometimes cloud is simply the better fit

Below a certain call volume, a managed cloud platform is cheaper and faster to stand up, and we'll say so. On-prem makes sense when data sovereignty, regulation, or sustained high volume are real constraints — not as a default for everyone.

We don't sell “an AI that replaces your staff” or a voice indistinguishable from a human — neither is true, and you'd find out on the first call. We build for the routine load and are honest about where a person takes over.

Privacy by reduction: because the audio never leaves your network, there's no third-party processor in the loop and no cloud recording of your calls. This is on-prem, offline-capable voice, and it sits alongside our other on-premise agentic systems. It is a separate capability from our on-prem real-estate CRM, which does document intelligence and has no voice component.

QUESTIONS

Straight answers

How is this different from Vapi, Retell, or ElevenLabs?

They're excellent cloud platforms, and faster to ship — we won't pretend otherwise. Our wedge: the whole pipeline runs on your hardware, so audio, transcripts, and caller data never leave your network. If “no call recordings on a multi-tenant cloud” is your constraint, that's the gap we work in.

How do you price it?

Per engagement — there's no list price. On-prem flips the per-minute cloud model: a one-time build and integration fee, then a flat monthly or per-line charge, because the compute is owned, not metered. The carrier's per-minute cost passes through.

Is an AI voice agent UK GDPR compliant?

It can be — and on-prem is the cleanest path, because there's no third-party processor in the loop. You remain the data controller: call-recording notice, retention, and lawful basis stay yours to set. We build the system; the compliance sign-off stays with you.

Can it book appointments and route calls?

Yes — that's the core of it: booking and rescheduling, call routing, FAQs, callbacks, and after-hours or overflow coverage, with anything real transferred to a person, context attached.

Last reviewed:

READY TO TALK?

Tell us about your call flow

Tell us where the phone hurts — missed after-hours bookings, overflow on busy lines, routine questions eating your team's day — and we'll scope a voice agent around your real call flow.