Building a twi chatbot means solving for a language with limited training data, no official Unicode keyboard standard until recently, and grammar rules that differ sharply from English-centric NLP models. Ghanaian developers building customer service bots for banks, e-commerce sites, or government portals face these constraints daily. This guide walks you through dataset sourcing, model selection, API integration, and deployment options that work for teams in Accra, Kumasi, or remote, with budgets under GHS 5,000 (April 2026) for a pilot.
Table of Contents
- TL;DR
- Why Twi Chatbots Matter in Ghana
- Step 1: Define Scope and Intent Architecture
- Step 2: Choose Your NLP Approach
- Rule-Based (Keyword Matching)
- Pre-Trained Multilingual Models (Fine-Tuned)
- Commercial APIs (OpenAI, Anthropic, Google)
- Step 3: Collect and Clean Training Data
- Free Twi Corpora
- Label Your Domain Data
- Step 4: Fine-Tune Your Model
- Setup
- Training Script
- Step 5: Build the Chatbot Logic Layer
- Flask API Example
- Step 6: Integrate with User-Facing Channels
- WhatsApp Business API
- Web Widget
- USSD (For Feature Phones)
- Step 7: Test with Real Users
- Step 8: Monitor and Improve
- Ghana-Specific Considerations
- Regulatory Compliance
- Pricing for Ghanaian Teams
- Local Language Nuances
- Telco-Specific Integrations
- Common Pitfalls
- Tools and Libraries Recap
- FAQs
- Related Reads
- Closing
- Sources
The payoff is real. MTN Ghana and Fidelity Bank have both piloted Twi-capable chatbots for account inquiries and mobile money support. The technology is no longer experimental, it is production-ready if you choose the right stack.
TL;DR
- Start with rule-based logic for common queries, layer in NLP only where needed
- Use existing Twi corpora (JW300, NLLB, Akuapem Twi Bible) plus your own transcripts
- Fine-tune mT5 or NLLB-200 models on Colab, costs under GHS 200 (April 2026) for compute
- Deploy on WhatsApp Business API (most Ghanaians already there) or web widget
- Plan for fallback to human agents when confidence score drops below 70%
Why Twi Chatbots Matter in Ghana
Twi is the first language for 8 million Ghanaians and a second language for millions more across Ashanti, Eastern, Central, Western, and parts of Brong-Ahafo regions. Customer service lines at telcos, banks, and utility providers report 40, 60% of inbound calls are in Twi, not English. A chatbot that handles “Mepɛ sɛ mehwɛ me sika a ɛwɔ me akonta mu” (I want to check my account balance) reduces queue time and cost per contact.
The business case is straightforward. A single human agent costs GHS 2,500, 4,000 per month (April 2026). A chatbot handling 200 tier-1 queries per day pays for itself in 60, 90 days, even with engineer time factored in.
Step 1: Define Scope and Intent Architecture
Start narrow. Do not attempt a general-purpose Twi assistant on day one. Pick 8, 12 intents your users ask most often.
Examples for a bank chatbot:
– Check account balance
– Transfer money to mobile money
– Report lost card
– Ask about loan eligibility
– Complain about unauthorised debit
– Request mini-statement
– Change PIN
– Speak to a human
Map each intent to expected Twi phrases. Collect real transcripts from your call centre or WhatsApp logs if you have them. If not, hire a Twi-fluent researcher on Upwork Ghana (GHS 50, 150 per hour, April 2026) to generate 20, 30 variations per intent.
Step 2: Choose Your NLP Approach
You have three paths:
Rule-Based (Keyword Matching)
Cheapest and fastest to deploy. Works when vocabulary is narrow and predictable.
Example logic:
IF message contains "sika" AND ("hwɛ" OR "chɛk"):
RETURN balance_check_intent
Pros: No model training, no API costs, runs locally.
Cons: Brittle. Breaks on typos, slang, or rephrasing.
Good for: MVPs with < 10 intents, regulated environments (government portals) where you control input format.
Pre-Trained Multilingual Models (Fine-Tuned)
Mid-tier cost and accuracy. Train on your own data.
Top models for Twi:
– NLLB-200 (Meta’s No Language Left Behind): Supports Twi (Akuapem and Asante variants). Open-source. 1.3B parameters for the distilled version.
– mT5 (Google): Multilingual T5. Covers 101 languages including Twi. Fine-tunable on modest hardware.
– AfroLM (Masakhane): African-language-first LLM. Twi support improving as of 2026.
Cost to fine-tune on Google Colab Pro: GHS 80, 200 (April 2026) (GPU runtime for 4, 8 hours).
Accuracy after fine-tuning: 75, 85% intent classification on 500+ labelled examples.
Good for: Production chatbots handling 50, 500 conversations per day.
Commercial APIs (OpenAI, Anthropic, Google)
Highest accuracy, highest cost.
OpenAI’s GPT-4 and Claude 3.5 Sonnet can handle Twi queries via prompt engineering. You provide a system prompt with Twi context and examples, the model infers intent.
Cost: USD 0.01, 0.06 per query (~GHS 0.11, 0.67 at April 2026 rates).
Accuracy: 85, 92% out of the box for common queries.
Good for: Enterprise deployments with budgets over GHS 10,000/month (April 2026), or services where accuracy matters more than cost (healthcare triage, legal aid).
Most Ghanaian startups choose path 2 (fine-tuned open-source models) for the cost-accuracy balance.
Step 3: Collect and Clean Training Data
Twi has three main dialects: Akuapem (literary standard), Asante (most widely spoken), and Fante (distinct enough that some linguists class it separately). Your chatbot should recognise all three, but you can start with Asante if budget is tight.
Free Twi Corpora
| Source | Size | Notes |
|---|---|---|
| JW300 (Jehovah’s Witness translations) | ~300k sentence pairs (Twi-English) | Religious domain, formal register |
| NLLB seed data | ~50k sentences | Crowdsourced via Masakhane |
| Akuapem Twi Bible | ~31k verses | Public domain, Akuapem dialect |
| Ghana NLP’s Twi Common Crawl | ~80k web sentences | Noisy, needs cleaning |
| Your own call logs | Varies | Gold standard for your domain |
Download JW300 from OPUS. Download NLLB data from Meta’s repo. Clean punctuation and normalise spelling (Twi has no standardised orthography, you’ll see “ɛ” vs “e”, “ɔ” vs “o”).
Label Your Domain Data
Export 500, 1,000 recent customer queries from your system. Hire Twi annotators on Sama, Remotasks, or local university students (GHS 5, 10 per 100 labelled examples, April 2026). Each query gets an intent label.
Store in CSV:
text,intent
"Mepɛ sɛ mehwɛ me sika",balance_check
"Dɛn na ɛbɛyɛ ansa na manya loan?",loan_eligibility
"Me card ayera",lost_card
Split 70% train, 15% validation, 15% test.
Step 4: Fine-Tune Your Model
We’ll use NLLB-200 distilled (1.3B parameters) as the example. Runs on a single T4 GPU (free tier on Colab).
Setup
Install Hugging Face Transformers:
pip install transformers datasets sentencepiece accelerate
Load the model:
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
model_name = "facebook/nllb-200-distilled-600M"
tokenizer = AutoTokenizer.from_pretrained(model_name, src_lang="twi_Latn")
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
Training Script
Fine-tune on your labelled intents:
from transformers import Trainer, TrainingArguments
from datasets import load_dataset
# Load your CSV
dataset = load_dataset('csv', data_files='twi_intents.csv')
def preprocess(examples):
inputs = tokenizer(examples['text'], max_length=128, truncation=True)
labels = tokenizer(examples['intent'], max_length=32, truncation=True)
inputs["labels"] = labels["input_ids"]
return inputs
tokenized_data = dataset.map(preprocess, batched=True)
training_args = TrainingArguments(
output_dir="./twi-chatbot-model",
per_device_train_batch_size=8,
num_train_epochs=3,
learning_rate=5e-5,
save_steps=500,
logging_steps=100
)
trainer = Trainer(
model=model,
args=training_args,
train_dataset=tokenized_data['train'],
eval_dataset=tokenized_data['validation']
)
trainer.train()
Runtime on Colab T4: 2, 4 hours for 500 examples. Cost if you exceed free tier: USD 5.50 (~GHS 61 at April 2026 rates) for 8 hours GPU.
Save the model locally:
model.save_pretrained("./twi-chatbot-final")
tokenizer.save_pretrained("./twi-chatbot-final")
Step 5: Build the Chatbot Logic Layer
Your model outputs intent labels. Now map intents to actions.
Flask API Example
from flask import Flask, request, jsonify
from transformers import pipeline
app = Flask(__name__)
classifier = pipeline("text-classification", model="./twi-chatbot-final")
INTENT_RESPONSES = {
"balance_check": "Mehwɛ wo akonta mu... Wo sika yɛ GHS {balance}.",
"loan_eligibility": "Fa wo nkyerɛwde bra kohu sɛ wubetumi anya loan.",
"lost_card": "Yɛde wo card no bɛto hɔ ntɛm. Frɛ 0800-HELP.",
}
@app.route('/chat', methods=['POST'])
def chat():
user_message = request.json['message']
result = classifier(user_message)[0]
intent = result['label']
confidence = result['score']
if confidence < 0.7:
return jsonify({"response": "Me ntee no yie. Human agent bɛka wo ho nkɔm."})
response = INTENT_RESPONSES.get(intent, "Me nnim ɛno ho asɛm.")
return jsonify({"response": response, "intent": intent})
if __name__ == '__main__':
app.run(port=5000)
Host on Heroku (free tier supports low traffic) or DigitalOcean Droplet (USD 6/month = ~GHS 67 at April 2026 rates).
Step 6: Integrate with User-Facing Channels
WhatsApp Business API
Most Ghanaians use WhatsApp daily. Integrating here gives you instant reach.
Steps:
1. Apply for WhatsApp Business API access via Meta Business.
2. Use a Business Solution Provider like Twilio (pricing: USD 0.005, 0.02 per message = ~GHS 0.06, 0.22 at April 2026 rates).
3. Webhook your Flask API to Twilio’s endpoint.
Sample webhook handler:
@app.route('/whatsapp', methods=['POST'])
def whatsapp_reply():
incoming_msg = request.values.get('Body', '')
from_number = request.values.get('From', '')
# Call your chatbot logic
bot_response = chat_internal(incoming_msg)
# Send reply via Twilio
client.messages.create(
body=bot_response,
from_='whatsapp:+233XXXXXXX',
to=from_number
)
return '', 200
Ghana-specific tip: Register your WhatsApp Business account with a local +233 number. Users trust local numbers more than foreign ones.
Web Widget
Embed a chat widget on your site using Botpress (free tier) or Rasa (open-source). Connect the widget to your Flask API backend.
Cost: Free if self-hosted. Botpress Cloud charges USD 10/month (~GHS 111 at April 2026 rates) for 5,000 conversations.
USSD (For Feature Phones)
Older Ghanaians and rural users rely on USSD. Integrate via your telco’s USSD gateway (MTN, Telecel, AirtelTigo). Pricing: GHS 0.02, 0.05 per session (April 2026).
USSD menus are text-only and session-based. Your chatbot can handle intent classification, but responses must be short (160 characters max per screen).
Step 7: Test with Real Users
Deploy a beta to 50, 100 users. Track:
– Intent classification accuracy (aim for 80%+)
– Fallback rate (how often users get escalated to humans)
– Average session length
– User satisfaction (ask “Was this helpful?” at end of chat)
Use Labelbox or Argilla to review misclassified queries and retrain weekly.
Step 8: Monitor and Improve
NLP models degrade as language evolves. Ghanaian Twi absorbs English loanwords fast (“me balance,” “me loan,” “data bundle”). Plan monthly retraining.
Set up alerts when confidence drops below 70% for >10% of queries. That signals your model needs fresh examples.
Ghana-Specific Considerations
Regulatory Compliance
If your chatbot collects personal data (phone numbers, account details), comply with Ghana’s Data Protection Act 2012. Register with the Data Protection Commission (fee: GHS 500, 2,000 depending on organisation size, April 2026).
Pricing for Ghanaian Teams
Full cost breakdown for a 6-month pilot:
| Item | Cost (GHS) |
|---|---|
| Colab Pro GPU time (fine-tuning) | 200 |
| DigitalOcean hosting (6 months) | 402 |
| WhatsApp Business API (5,000 msgs/month) | 792 |
| Twi annotator labour (1,000 examples) | 500 |
| Domain name + SSL | 100 |
| Contingency | 460 |
| Total | 2,454 |
Far cheaper than hiring a full-time agent (GHS 15,000 over 6 months, April 2026).
Local Language Nuances
Asante Twi speakers may write “ɔpɛ” where Akuapem speakers write “ope.” Train on both. Fante diverges more (“ɔpɛ” becomes “ɔpɛ” but grammar differs). If your user base spans Central and Western Regions, budget extra annotation time.
Telco-Specific Integrations
MTN Ghana and Telecel offer sandbox environments for USSD and SMS testing. Contact their developer relations teams. AirtelTigo’s API documentation is sparse as of April 2026, expect longer integration time.
Common Pitfalls
Overfitting on formal Twi: Bible and JW300 data use literary register. Real users say “chale” and “eiii” and code-switch to English mid-sentence. Your training data must reflect this.
Ignoring dialects: A bot trained only on Asante Twi will frustrate Akuapem and Fante speakers. Collect at least 100 examples per dialect.
No fallback plan: When confidence is low, route to a human immediately. Do not guess. Ghanaian users abandon bots that give wrong answers twice in a row.
Deploying without Ghanaian QA: Have native Twi speakers test every intent before launch. Developers who learned Twi as a second language miss slang and tone.
Tools and Libraries Recap
| Tool | Purpose | Cost |
|---|---|---|
| Hugging Face Transformers | Model training and inference | Free (open-source) |
| NLLB-200 | Twi-capable translation model | Free (open-source) |
| Google Colab | GPU compute for training | Free tier available, Pro = USD 9.99/month (~GHS 111 at April 2026 rates) |
| Twilio WhatsApp API | WhatsApp integration | USD 0.005, 0.02 per message (~GHS 0.06, 0.22 at April 2026 rates) |
| Botpress | Web widget and orchestration | Free tier, Cloud = USD 10/month (~GHS 111 at April 2026 rates) |
| Argilla | Annotation and model monitoring | Free (open-source) |
FAQs
Can I build a Twi chatbot without knowing how to code?
No-code tools like ManyChat and Chatfuel support rule-based Twi bots if you manually input all response variations. For NLP-powered bots, you need Python skills or a developer.
How accurate is Google Translate for Twi chatbot fallback?
Google Translate handles Twi-to-English at 60, 70% accuracy for simple sentences. Not reliable enough for production. See our deep-dive on Google Translate for Twi accuracy.
Which Twi dialect should I prioritise?
Asante if your users are in Ashanti, Eastern, and Greater Accra regions. Akuapem if your content is literary or educational. Fante if you serve Central and Western Regions. Ideally, train on all three.
Can I use ChatGPT or Claude for Twi queries?
Yes. Both GPT-4 and Claude 3.5 handle Twi queries with prompt engineering. Expect 80, 90% accuracy but higher cost (USD 0.045+ per query = ~GHS 0.50+ at April 2026 rates). See AI That Speaks Twi: What’s Actually Possible in 2026 for benchmarks.
How do I handle code-switching (Twi + English in one message)?
Train on mixed-language examples. Ghanaians often say “Me pɛ sɛ me check me balance.” Your model must tokenise both languages. NLLB-200 and mT5 handle this natively.
What if my chatbot gets a question it can’t answer?
Return a fallback message in Twi (“Me nnim ɛno ho asɛm. Human agent bɛboa wo.”) and route to a live agent. Track these queries to identify gaps in your training data.
How long does it take to build a production-ready Twi chatbot?
For a team with one developer and one Twi linguist: 4, 6 weeks from data collection to pilot launch. Add 2 weeks for WhatsApp API approval.
Do I need Data Protection Commission approval before launch?
Yes, if you collect personal identifiers (phone numbers, names, national ID). Register at dataprotection.org.gh. Processing time: 2, 4 weeks. Non-compliance risks fines up to GHS 50,000 (April 2026).
Related Reads
- Zoom out: AI Tools and Services in Ghana
- Topic hub: AI in Ghanaian Languages: Twi, Ga, Ewe, Hausa
- Related deep-dives:
- AI That Speaks Twi: What’s Actually Possible in 2026
- AI Voice Assistants in Local Ghanaian Languages
- Best Translation Apps for Ghanaian Languages
- Ghana NLP and Local-Language AI Startups to Watch
Closing
Twi chatbots are no longer experimental in Ghana. Banks, telcos, and government agencies are deploying them in 2026 because the ROI is proven and the tools are accessible. The hardest part is not the code, it is collecting enough quality Twi data and testing with real users who speak the language daily. Start small, ship fast, and retrain often.
If you are building a Twi chatbot or exploring other local-language AI projects, share your progress or questions with us. Follow our updates on X at @jbklutsemedia.
Sources
- Meta NLLB-200 project repository
- Hugging Face Transformers documentation
- OPUS JW300 parallel corpus
- Ghana Data Protection Commission
- Twilio WhatsApp Business API pricing
- Google Colab pricing
- Masakhane NLP for African languages



