# Training a new language

### Text support vs voice support

| Type              | What it enables                                                                          | Data required          |
| ----------------- | ---------------------------------------------------------------------------------------- | ---------------------- |
| **Text support**  | Your AI Agent recognises and responds to written messages in the target language         | Text data only         |
| **Voice support** | Your AI Agent understands spoken messages and responds with voice in the target language | Text data + voice data |

All new languages require text data preparation. Voice data is only required if your AI Agent will support voice channels.

***

### Before you begin

Confirm the following with your Proto contact before starting any data work:

| Information     | Description                                                                                                                                                                 | Example                                                          |
| --------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------- |
| Target language | The language being added. If the language has regional variants, specify which one – regional variants can differ significantly in vocabulary, spelling, and pronunciation. | English (UK) vs English (US); French (France) vs French (Canada) |
| Use case        | The primary purpose of your AI Agent                                                                                                                                        | Complaint handling, customer FAQs, sales support                 |
| Channels        | Where the AI Agent will be deployed                                                                                                                                         | WhatsApp, Webchat, voice IVR                                     |
| Timeline        | Your expected go-live date                                                                                                                                                  | —                                                                |

{% hint style="info" %}
Why specifying the variant matters: Many languages have regional variants that differ in vocabulary, spelling, and pronunciation. English (UK) and English (US) are a familiar example – the same logic applies across many languages worldwide. Specifying the correct variant from the start ensures the AI Agent is trained on the right language for your audience.
{% endhint %}

***

### Guides

Follow the relevant guide or guides based on your requirements:

{% content-ref url="/pages/LffWZfYXvAgtNRag4bLY" %}
[Voice data collection](/docs/language-acquisition/training-a-new-language/voice-data-collection.md)
{% endcontent-ref %}

{% content-ref url="/pages/dCS14f5rFaYHkrr9eCp6" %}
[Text data preparation](/docs/language-acquisition/training-a-new-language/text-data-preparation.md)
{% endcontent-ref %}

* **Text data preparation** – required for all new languages
* **Voice data collection** – required for voice-enabled AI Agents only

***

### Common questions

**Do I need voice data if my AI Agent is text-only?**

No. Voice data is only required if your AI Agent will understand spoken messages or respond out loud. Text-only deployments only require the text data preparation guide.

**Can my translator use an AI translation tool to speed up the work?**

For widely spoken languages such as French, Spanish, or Arabic, AI translation tools can help with speed. For low-resource languages – languages with limited digital resources, such as many local African or Southeast Asian languages – reliable translation tools are unlikely to be available. In those cases, a human native speaker is required for all translation work.

**Can existing recordings be used for voice training?**

Yes. If you already have audio in the target language with matching transcripts, follow Option A in the voice data collection guide. Contact your Proto contact first to confirm the files meet the format requirements.

**What if my language is not widely supported by standard AI tools?**

Low-resource languages require more manual work throughout. All translation must be done by a human native speaker, and voice training requires the full recording process. Your Proto contact can advise on what to expect for your specific language.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://documentation.proto.cx/docs/language-acquisition/training-a-new-language.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
