Training a new language
Adding a new language to your AI Agent requires preparing two types of data, depending on whether you need text-only or voice-enabled support.
Text support vs voice support
Text support
Your AI Agent recognises and responds to written messages in the target language
Text data only
Voice support
Your AI Agent understands spoken messages and responds with voice in the target language
Text data + voice data
All new languages require text data preparation. Voice data is only required if your AI Agent will support voice channels.
Before you begin
Confirm the following with your Proto contact before starting any data work:
Target language
The language being added. If the language has regional variants, specify which one – regional variants can differ significantly in vocabulary, spelling, and pronunciation.
English (UK) vs English (US); French (France) vs French (Canada)
Use case
The primary purpose of your AI Agent
Complaint handling, customer FAQs, sales support
Channels
Where the AI Agent will be deployed
WhatsApp, Webchat, voice IVR
Timeline
Your expected go-live date
—
Why specifying the variant matters: Many languages have regional variants that differ in vocabulary, spelling, and pronunciation. English (UK) and English (US) are a familiar example – the same logic applies across many languages worldwide. Specifying the correct variant from the start ensures the AI Agent is trained on the right language for your audience.
Guides
Follow the relevant guide or guides based on your requirements:
Voice data collectionText data preparationText data preparation – required for all new languages
Voice data collection – required for voice-enabled AI Agents only
Common questions
Do I need voice data if my AI Agent is text-only?
No. Voice data is only required if your AI Agent will understand spoken messages or respond out loud. Text-only deployments only require the text data preparation guide.
Can my translator use an AI translation tool to speed up the work?
For widely spoken languages such as French, Spanish, or Arabic, AI translation tools can help with speed. For low-resource languages – languages with limited digital resources, such as many local African or Southeast Asian languages – reliable translation tools are unlikely to be available. In those cases, a human native speaker is required for all translation work.
Can existing recordings be used for voice training?
Yes. If you already have audio in the target language with matching transcripts, follow Option A in the voice data collection guide. Contact your Proto contact first to confirm the files meet the format requirements.
What if my language is not widely supported by standard AI tools?
Low-resource languages require more manual work throughout. All translation must be done by a human native speaker, and voice training requires the full recording process. Your Proto contact can advise on what to expect for your specific language.
Last updated