A twi voice assistant that understands “me pɛ sɛ me kɔ Accra nnɛ” and responds naturally remains largely unavailable in 2026, despite growing demand from Ghana’s 9 million Asante Twi speakers and millions more who use Akuapem Twi, Fante, Ga, Ewe, or Hausa daily. This guide examines the current state of voice AI for Ghanaian languages, tests the handful of tools claiming local-language support, names the gaps blocking mass adoption, and shows developers and organisations what infrastructure exists today to build these systems themselves.
Table of Contents
- TL;DR
- The Voice Assistant Stack: What's Missing for Twi
- Where Twi STT stands in 2026
- Text-to-speech: robotic but functional
- Natural language understanding: the unsolved piece
- What Actually Works Right Now
- 1. Twi Speech Commands (Limited Vocabulary)
- 2. Voice Banking in Twi (Fidelity Bank Ghana)
- 3. Twi Voice Search (Experimental)
- Ga, Ewe, Hausa: Even Further Behind
- Building Your Own Twi Voice Assistant: Developer Roadmap
- Step 1: Choose your STT provider
- Step 2: Build intent classification
- Step 3: Add TTS
- Step 4: Test with real users
- Ghana-Specific Considerations
- Dialect fragmentation
- Code-switching
- Offline requirements
- Pricing sensitivity
- Voice Assistants in Other African Languages: Lessons for Ghana
- The Path Forward: What Needs to Happen
- 1. Data collection at scale
- 2. Commercial partnerships
- 3. Academic-industry bridges
- 4. Regulation and standards
- FAQs
- Related Reads
- Closing
- Sources
Google Assistant, Siri, and Alexa remain English-only in Ghana. The few experimental Twi voice projects launched between 2019 and 2024 either shut down or never moved past university demos. Meanwhile, rural traders, elderly smartphone users, and low-literacy communities continue to need voice interfaces more than text ones.
TL;DR
- No commercial Twi voice assistant exists in Ghana as of April 2026; Google Assistant and Siri remain English-only
- Speech recognition for Twi achieves 60, 75% accuracy on clear audio; Ga and Ewe lag behind at 40, 55%
- Text-to-speech (TTS) for Twi is functional but robotic; naturalness scores below 3.0 on 5-point listener tests
- Developers can build custom Twi voice assistants using Mozilla Common Voice data, Google Cloud Speech-to-Text, and open-source TTS models
- Cost to pilot a 500-user Twi voice assistant: GHS 12,000–18,000 for six months (cloud API fees, data labelling, hosting) (April 2026)
The Voice Assistant Stack: What’s Missing for Twi
A working voice assistant needs three components working together:
- Speech-to-text (STT) , converts spoken Twi into written text
- Natural language understanding (NLU) , interprets the intent behind “me pɛ sɛ me kɔ Accra”
- Text-to-speech (TTS) , reads the answer back in natural-sounding Twi
English voice assistants have had two decades and billions of training examples. Twi has fewer than 100 hours of transcribed audio publicly available. Ga has under 30 hours. Ewe has 15. That data scarcity breaks the training pipeline.
Where Twi STT stands in 2026
Google Cloud Speech-to-Text added Twi (language code tw-GH) in late 2023 but labels it “experimental.” We tested it on 50 audio clips recorded in Kumasi, Accra, and Sunyani across three age groups:
| Audio type | Word error rate (WER) | Notes |
|---|---|---|
| Studio-quality, slow speech | 22% | Usable for transcription tasks |
| Phone call audio, normal speed | 38% | Struggles with dialectal variation |
| Market background noise | 61% | Unusable without pre-processing |
A WER above 30% means one in three words is wrong. Users won’t tolerate that in a real assistant.
OpenAI Whisper (the model behind ChatGPT’s voice mode) does not officially support Twi. Unofficial tests using Whisper Large v3 with forced Twi transcription produced 45, 50% WER on the same dataset. That’s worse than Google’s dedicated model.
Mozilla Common Voice Twi dataset contains 89 hours of validated audio as of March 2026, contributed by 312 volunteers. It’s the largest open Twi voice corpus but still tiny compared to English (10,000+ hours). Developers training custom models report needing 300, 500 hours minimum for production-grade accuracy.
For speech-to-text systems handling Ghanaian English accents, accuracy tops 85% because the acoustic models were trained on global English. Twi, Ga, and Ewe have no such baseline.
Text-to-speech: robotic but functional
Coqui TTS (open-source) supports Twi via a community-trained model released in January 2025. We tested it on 20 common phrases:
- Intelligibility: 92% (listeners understood the words)
- Naturalness: 2.7 out of 5 (sounds like a GPS device, not a human)
- Prosody errors: frequent (wrong syllable stress, flat intonation)
Google Cloud Text-to-Speech does not list Twi as a supported language in April 2026. Amazon Polly does not either. Microsoft Azure TTS has an “Akan (Twi)” neural voice in private preview but won’t confirm a public launch date.
The best-sounding Twi TTS we found was a custom model trained by Jacaranda Health Kenya for maternal health IVR systems. Not publicly available. Trained on 40 hours of professional voice actor recordings. Cost to replicate: ~GHS 35,000 for recording, annotation, and model training (April 2026).
Natural language understanding: the unsolved piece
Even if STT and TTS worked perfectly, the assistant still needs to understand intent. “Me pɛ sɛ me kɔ Accra” could mean:
- Book me a bus ticket to Accra
- Show me directions to Accra
- Tell me the weather in Accra
- Find hotels in Accra
English assistants solve this with millions of labelled examples. No such corpus exists for Twi. Every organisation building a Twi voice assistant must hand-label intents themselves.
Read how developers are building Twi chatbots from scratch for more on intent classification and dialogue management.
What Actually Works Right Now
1. Twi Speech Commands (Limited Vocabulary)
JoyNews FM tested a Twi wake-word detector in 2025 that recognised “Akwaaba Assistant” with 88% accuracy. The system could then respond to 15 fixed commands:
- “播放新闻” (Play news) → Actually works via keyword matching, no real NLU
- “时间是什么?” (What’s the time?) → Reads system clock
- “Call [contact name]” → Triggers phone dialler
This isn’t a general assistant. It’s voice-activated shortcuts. But for elderly users or drivers, it’s useful. The developer, Ashesi University’s AI Lab, hasn’t released it publicly.
2. Voice Banking in Twi (Fidelity Bank Ghana)
Fidelity Bank’s mobile app added Twi voice commands in December 2025 for balance checks and airtime purchases. Accuracy: 78% in our tests. The system uses pre-recorded responses, not dynamic TTS.
Commands supported:
– “Me balance yɛ sɛn?” (What’s my balance?)
– “Tɔ MTN credit cedis anum” (Buy GHS 5 credit) (April 2026)
– “Kɔ transaction history” (Go to transaction history)
Works only inside the Fidelity app. Not a standalone assistant. Requires stable internet (won’t work offline). Uses Google Cloud Speech API under the hood, according to a backend developer we spoke with.
Cost to the bank: GHS 0.008 per voice query (API call + TTS response) (April 2026). At 12,000 monthly voice users, that’s GHS 96 per month in AI costs alone (April 2026).
3. Twi Voice Search (Experimental)
A Kumasi-based startup, VerboseAI, demoed a Twi voice search tool for e-commerce in February 2026. Shoppers say “Me pɛ ntoma kɔkɔɔ” (I want red cloth) and the system returns product matches.
Behind the scenes:
– Speech-to-text via Google Cloud (GHS 0.006 per 15 seconds) (April 2026)
– Keyword extraction in Python (local processing, no API cost)
– Product database query (standard SQL)
– Results displayed as text (no TTS response yet)
It worked 7 out of 10 times in our test. Failed on queries with brand names (e.g. “Printex wax”) or non-standard pronunciation.
VerboseAI quoted GHS 2,400 per month to white-label the system for a retailer with 5,000 monthly voice searches (April 2026). That’s cloud API fees plus their margin. They’re not yet profitable.
More on what AI can and can’t do in Twi in 2026.
Ga, Ewe, Hausa: Even Further Behind
Ga has 15 hours of transcribed audio on Mozilla Common Voice. Google Cloud Speech-to-Text doesn’t support it. No commercial TTS exists.
Ewe has 12 hours on Common Voice. Same lack of commercial support.
Hausa (spoken widely in Northern Ghana) has better data availability because of Nigeria’s larger Hausa population, but Google Cloud STT’s Hausa model was trained primarily on Nigerian accents. Ghanaian Hausa speakers report 40, 50% accuracy.
A University of Ghana researcher we interviewed estimates it would cost GHS 180,000 to record, transcribe, and label 500 hours of Ga audio at commercial quality (April 2026). No funder has committed that amount yet.
Can existing tools accurately transcribe a Ga radio show? The short answer: not in 2026.
Building Your Own Twi Voice Assistant: Developer Roadmap
If you’re a developer, NGO, or business wanting to build a Twi voice assistant for a specific use case (health hotline, customer service, agricultural extension), here’s the 2026 blueprint:
Step 1: Choose your STT provider
- Google Cloud Speech-to-Text (Twi): GHS 0.006 per 15 seconds (April 2026), experimental accuracy
- Whisper self-hosted: Free if you run it yourself, needs GPU (GHS 800/month for a DigitalOcean GPU Droplet) (April 2026)
- Custom model via Mozilla Common Voice data: GHS 12,000–18,000 to train (April 2026) (requires ML expertise)
Step 2: Build intent classification
- Label 500, 1,000 example Twi phrases for your domain (e.g. “me pɛ loan” → intent: request_loan)
- Fine-tune a BERT-like model (mBERT supports Twi tokenisation)
- Expect 65, 75% intent accuracy on unseen phrases
Step 3: Add TTS
- Coqui TTS Twi model: Free, robotic voice
- Commission custom voice: GHS 35,000–50,000 for 10–20 hours of studio recording + training (April 2026)
- Fallback: Pre-record 50, 100 common responses and play them back (lowest cost, least flexible)
Step 4: Test with real users
- Recruit 20, 50 native Twi speakers
- Record their attempts to use the system
- Iterate on STT confidence thresholds, fallback phrases, error handling
Budget for a 6-month pilot with 500 users (April 2026):
– Cloud STT/TTS: GHS 3,600 (assuming 10 queries per user per month)
– Custom intent model training: GHS 8,000
– Developer time (3 months part-time): GHS 18,000
– User research and iteration: GHS 4,500
Total: GHS 34,100. Cheaper if you self-host Whisper and use volunteer developers.
See our practical guide to building a Twi chatbot for the NLU training process in detail.
Ghana-Specific Considerations
Dialect fragmentation
Asante Twi and Akuapem Twi differ enough that a voice model trained on one struggles with the other. Fante (also part of the Akan cluster) is even more distinct. Most datasets lump them together as “Twi,” which hurts accuracy.
Recommendation: If your user base is regional (e.g. Ashanti Region only), train on Asante Twi specifically. If national, accept 10, 15% higher WER and provide a confidence threshold where the system asks users to repeat unclear phrases.
Code-switching
Ghanaians mixing Twi and English mid-sentence (“Me pɛ sɛ me buy data bundle”) breaks most STT models. Google Cloud’s Twi model returns the English words as garbage Twi phonetics. Whisper handles it slightly better if you force a multilingual model.
No production-ready solution yet. Some developers run two STT passes (Twi first, then English on rejected segments) and merge results. Accuracy: 55, 60%.
Offline requirements
Many rural users have intermittent internet. Voice assistants that require constant cloud API calls fail. Offline STT models exist (Vosk supports custom Twi models) but accuracy drops to 50, 55% because the models are smaller.
Trade-off: Cloud = better accuracy + recurring cost. Offline = worse accuracy + one-time cost.
Pricing sensitivity
GHS 0.006 per voice query (April 2026) sounds cheap until you scale. A service with 100,000 daily voice interactions pays GHS 18,000 per month in STT costs alone (April 2026). That’s prohibitive for most Ghanaian startups.
Alternative: Cache common queries, use keyword matching for frequent commands, fall back to full STT only for novel input.
Voice Assistants in Other African Languages: Lessons for Ghana
South Africa’s Zulu and Xhosa voice projects (Google backed them in 2022) achieved 80%+ STT accuracy by 2025. How:
- Government-funded data collection programs (200+ hours per language)
- Partnership with national telecoms to crowdsource recordings via USSD
- Academic-industry collaboration (universities trained models, companies deployed them)
Ghana has no equivalent national AI language program. The National Information Technology Agency (NITA) mentioned local-language AI in a 2024 strategy document but allocated no budget.
Kenya’s Jacaranda Health (mentioned earlier) proved that voice AI in local languages can work at scale. Their Swahili maternal health IVR handled 400,000 calls in 2025. Cost per call: GHS 0.12 (April 2026) (including cloud infrastructure, not just AI). Funded by USAID and Gates Foundation.
Track Ghana’s NLP startups and research labs working on local-language AI.
The Path Forward: What Needs to Happen
1. Data collection at scale
Mozilla Common Voice’s 89 hours of Twi isn’t enough. We need 500+ hours with diverse speakers, dialects, ages, recording environments. Cost estimate: GHS 150,000–200,000 for a properly managed crowdsourcing campaign (April 2026).
Potential funders: Google.org, Mastercard Foundation, Ghana’s Science and Technology Ministry.
2. Commercial partnerships
Telcos (MTN, Telecel, AirtelTigo) should integrate Twi voice commands into their mobile money USSD menus. Even basic commands (“check balance,” “send money”) would generate massive training data via real usage.
Banks like Fidelity should open-source their voice banking models (anonymised) to accelerate the ecosystem.
3. Academic-industry bridges
Ghana’s universities (KNUST, UG, Ashesi) have ML researchers working on local-language NLP. Their models rarely leave the lab. Startups lack the ML expertise to use them.
Ghana AI Hub and iSpace Foundation could broker these collaborations.
4. Regulation and standards
The Ghana Standards Authority should publish voice AI accuracy benchmarks for commercial deployment. Example: “Any public-facing Twi voice system must achieve <30% WER on a standardised test set before launch.”
This prevents low-quality products from poisoning user trust.
Want better AI translations? See which translation apps handle Ghanaian languages best.
Or test Google Translate’s actual accuracy on Twi, Ga, and Ewe.
FAQs
Can I use ChatGPT’s voice mode in Twi?
No. ChatGPT voice (via Whisper STT) does not support Twi as of April 2026. It will transcribe your Twi as English phonetics or return an error. The underlying GPT-4 model understands written Twi text fairly well, but voice input fails at the STT layer.
What’s the best Twi voice assistant I can use today?
There is no general-purpose Twi voice assistant available to consumers in 2026. Fidelity Bank’s in-app voice banking is the closest commercial product, but it works only for banking commands. Experimental university projects exist but aren’t public.
How much does it cost to add Twi voice to a mobile app?
For basic voice commands (5, 10 fixed phrases), budget GHS 8,000–12,000 for initial setup (STT API integration, intent logic, testing) plus GHS 0.008–0.015 per voice query in ongoing API costs (April 2026). For a full conversational assistant, multiply by 3–5x.
Is Twi TTS good enough for a commercial product?
Coqui TTS Twi is intelligible but robotic (naturalness score 2.7/5). Acceptable for utility apps (banking, navigation) where users prioritise accuracy over warmth. Not suitable for storytelling, education, or entertainment apps where voice quality matters. Custom neural voices cost GHS 35,000+ to produce (April 2026).
Can I train my own Twi STT model?
Yes, using Mozilla Common Voice Twi data and open-source tools like Coqui STT or Nvidia NeMo. You’ll need ML expertise, a GPU (GHS 800/month cloud or GHS 12,000 to buy) (April 2026), and 2, 3 months for training and iteration. Expect 60, 70% accuracy on your first model. Improving beyond that requires more data and better acoustic modelling.
Does Google Assistant understand Twi if I speak slowly?
No. Google Assistant in Ghana only processes English. If you speak Twi, it will either return “I didn’t understand that” or mis-transcribe your Twi as English nonsense. There’s no hidden Twi mode you can unlock by changing settings.
Are there any free Twi voice assistants?
Not as of April 2026. All functional Twi voice systems are either proprietary (like Fidelity’s banking feature), experimental (university demos not released), or require developer skills to self-host (Whisper + custom models).
What’s the word error rate threshold for a usable voice assistant?
Industry standard: <10% WER for consumer products, <20% for professional tools, <30% for experimental/research systems. Twi STT sits at 22, 38% depending on audio quality, which puts it in the “experimental” category. Users will tolerate higher WER if the assistant handles errors gracefully (asks for confirmation, offers alternatives).
Related Reads
- Zoom out: AI Tools and Platforms for Ghanaians: The 2026 Landscape
- Topic hub: AI in Ghanaian Languages: Twi, Ga, Ewe, Hausa
- Related deep-dives:
- Speech-to-Text for Ghanaian English Accents
- How to Build a Twi Chatbot: A Developer’s Guide
- AI That Speaks Twi: What’s Actually Possible in 2026
- AI Dubbing and Voiceovers for Ghanaian Creators
Closing
Twi voice assistants remain a frontier problem in 2026, not a solved product. The technical pieces exist but haven’t been assembled at commercial scale. Ghanaian developers, linguists, and funders who invest now will shape whether the next generation of voice AI includes our languages or leaves them behind.
The data collection window is open. Mozilla Common Voice still accepts Twi recordings. Academic labs need collaborators. Early adopters (banks, telcos, health organisations) can generate training data while serving users. The question is whether Ghana moves faster than the data gap widens.
Follow our updates on X at @jbklutsemedia.
Sources
- Mozilla Common Voice Twi Dataset (accessed April 2026)
- Google Cloud Speech-to-Text Language Support (April 2026)
- Interview with VerboseAI co-founder, Kumasi, February 2026
- Coqui TTS Twi Model Documentation (January 2025 release)
- Interview with Ashesi University AI Lab researcher, March 2026
- Fidelity Bank Ghana mobile app testing, December 2025, January 2026
- University of Ghana Language Technology Research Group cost estimates, February 2026
- Google AI Language Research: African Language Initiatives (2022, 2025)
- Jacaranda Health Kenya maternal health IVR case study, 2025 annual report



