Training a new language

Adding a new language to your AI Agent requires preparing two types of data, depending on whether you need text-only or voice-enabled support.

Text support vs voice support

Type
What it enables
Data required

Text support

Your AI Agent recognises and responds to written messages in the target language

Text data only

Voice support

Your AI Agent understands spoken messages and responds with voice in the target language

Text data + voice data

All new languages require text data preparation. Voice data is only required if your AI Agent will support voice channels.


Before you begin

Confirm the following with your Proto contact before starting any data work:

Information
Description
Example

Target language

The language being added. If the language has regional variants, specify which one – regional variants can differ significantly in vocabulary, spelling, and pronunciation.

English (UK) vs English (US); French (France) vs French (Canada)

Use case

The primary purpose of your AI Agent

Complaint handling, customer FAQs, sales support

Channels

Where the AI Agent will be deployed

WhatsApp, Webchat, voice IVR

Timeline

Your expected go-live date

circle-info

Why specifying the variant matters: Many languages have regional variants that differ in vocabulary, spelling, and pronunciation. English (UK) and English (US) are a familiar example – the same logic applies across many languages worldwide. Specifying the correct variant from the start ensures the AI Agent is trained on the right language for your audience.


Guides

Follow the relevant guide or guides based on your requirements:

Voice data collectionchevron-rightText data preparationchevron-right
  • Text data preparation – required for all new languages

  • Voice data collection – required for voice-enabled AI Agents only


Common questions

Do I need voice data if my AI Agent is text-only?

No. Voice data is only required if your AI Agent will understand spoken messages or respond out loud. Text-only deployments only require the text data preparation guide.

Can my translator use an AI translation tool to speed up the work?

For widely spoken languages such as French, Spanish, or Arabic, AI translation tools can help with speed. For low-resource languages – languages with limited digital resources, such as many local African or Southeast Asian languages – reliable translation tools are unlikely to be available. In those cases, a human native speaker is required for all translation work.

Can existing recordings be used for voice training?

Yes. If you already have audio in the target language with matching transcripts, follow Option A in the voice data collection guide. Contact your Proto contact first to confirm the files meet the format requirements.

What if my language is not widely supported by standard AI tools?

Low-resource languages require more manual work throughout. All translation must be done by a human native speaker, and voice training requires the full recording process. Your Proto contact can advise on what to expect for your specific language.

Last updated