Koca Ventures Ltd
71-75 Shelton Street
Covent Garden, London
WC2H 9JQ, United Kingdom
Registered in England & Wales — 16231043
An AI voice agent that answers your phones —running on your own hardware.
AI voice agents that answer business calls — booking, routing, FAQs, callbacks — with every second of audio staying on your own network. We're honest about the limits: natural turn-taking, not a human impersonation.
Where on-prem voice earns its place
Clinics & regulated practices
After-hours booking, insurance and hours questions, and triage-then-transfer. The cleanest fit when patient data can't sit on a multi-tenant cloud.
Dealerships & dealer groups
Service-appointment booking, multi-location call routing, and lead capture when every line is busy. The agent handles the routine ask and routes the rest to the right desk.
Restaurants & hospitality
Reservations, party-size and availability, menu questions, and confirmations — so the phone stops pulling staff off the floor during a rush.
After-hours & overflow reception
Most missed calls never ring back — and a large share of inbound arrives after hours or when every line is busy. The agent covers overflow, books or takes a message, and escalates anything real.
The routine calls, handled — the hard ones, handed over
Booking and rescheduling, call routing, the questions you're asked all day, callbacks, and the hours your team can't cover. The agent handles the routine majority of calls end to end and transfers anything real to a person, with the context already gathered.
Every layer runs on hardware you control
Speech-to-text (local)
faster-whisper as the workhorse — robust across languages including Turkish, running on your own GPU. whisper.cpp is the CPU fallback where there's no NVIDIA card.
Turn-taking & barge-in (local)
Silero VAD for speech detection plus LiveKit's semantic turn-detector, so the agent knows that “I need to think about that…” isn't the end of a turn. Runs on CPU and covers Turkish.
The dialogue model (local)
A self-hosted LLM (Qwen3 or a Llama-class 8B) served by vLLM for concurrent callers, or Ollama for the simple single-line case — kept in VRAM so the agentic loop stays fast.
Text-to-speech (local)
Kokoro or Piper for commercial-safe local voices; XTTS-v2 for cloned voices where a licence allows it. This is the layer with the honest caveat below.
Telephony & the SIP bridge
A real phone number via a SIP trunk (Twilio or Telnyx) bridged into a self-hosted media server — Asterisk, FreeSWITCH, or LiveKit SIP. Your existing PBX stays; only the lines you choose route to the agent.
Orchestration
LiveKit Agents or Pipecat ties the pipeline together — streaming every stage and handling interruptions. We run it on your machines or, as a managed option, on our own two-node GPU cluster.
What on-prem voice can't do (yet) — said plainly
Latency is ~0.5–1.2 seconds, not human-equal
A well-tuned local stack lands around half a second to just over a second, end to end. That's natural, interruptible turn-taking — but a human leaves roughly a 200ms gap, and even the fastest cloud speech is around 0.8–1.1s. We won't tell you it's indistinguishable from a person, because it isn't.
Turkish text-to-speech is the weak link
The commercial-safe local Turkish voice (Piper) is more robotic; the most natural one (XTTS-v2) carries a non-commercial licence that needs a separate agreement before production use. Turkish speech-to-text and turn-taking are solid — but on TTS naturalness, we set expectations honestly rather than over-sell.
On-prem trades cloud uptime for data control
Cloud platforms give you 99.9%+ geo-redundant uptime out of the box. Running on your own hardware means a failure on your premises is a real event that needs handling — which is why on-prem comes with a maintenance and monitoring retainer, and a hybrid failover path where it's warranted.
Sometimes cloud is simply the better fit
Below a certain call volume, a managed cloud platform is cheaper and faster to stand up, and we'll say so. On-prem makes sense when data sovereignty, regulation, or sustained high volume are real constraints — not as a default for everyone.
We don't sell “an AI that replaces your staff” or a voice indistinguishable from a human — neither is true, and you'd find out on the first call. We build for the routine load and are honest about where a person takes over.
Privacy by reduction: because the audio never leaves your network, there's no third-party processor in the loop and no cloud recording of your calls. This is on-prem, offline-capable voice, and it sits alongside our other on-premise agentic systems. It is a separate capability from our on-prem real-estate CRM, which does document intelligence and has no voice component.
Straight answers
How is this different from Vapi, Retell, or ElevenLabs?
They're excellent cloud platforms, and faster to ship — we won't pretend otherwise. Our wedge: the whole pipeline runs on your hardware, so audio, transcripts, and caller data never leave your network. If “no call recordings on a multi-tenant cloud” is your constraint, that's the gap we work in.
How do you price it?
Per engagement — there's no list price. On-prem flips the per-minute cloud model: a one-time build and integration fee, then a flat monthly or per-line charge, because the compute is owned, not metered. The carrier's per-minute cost passes through.
Is an AI voice agent UK GDPR compliant?
It can be — and on-prem is the cleanest path, because there's no third-party processor in the loop. You remain the data controller: call-recording notice, retention, and lawful basis stay yours to set. We build the system; the compliance sign-off stays with you.
Can it book appointments and route calls?
Yes — that's the core of it: booking and rescheduling, call routing, FAQs, callbacks, and after-hours or overflow coverage, with anything real transferred to a person, context attached.
Last reviewed:
Tell us about your call flow
Tell us where the phone hurts — missed after-hours bookings, overflow on busy lines, routine questions eating your team's day — and we'll scope a voice agent around your real call flow.
