# Voice API

This document describes the APIs for voice processing, covering **Automatic Speech Recognition (ASR)** and **Text-to-Speech (TTS)** services.

***

### Authentication

Each request to the developer API must include a bearer token in the **Authorization** header. This bearer token should be in the form of **Authorization: Bearer {subcompany\_takeover\_secret}**. This takeover secret can be found in the [teamspace](https://documentation.proto.cx/docs/settings/teamspaces) settings page.

```
Authorization: Bearer <token>
```

***

### Voice API

#### Text to Speech (TTS)

Convert text into spoken audio.

**POST**

```
${BASE_URL}/platform/v1/voice/{subcompany_id}/tts
```

**Headers**

```
Authorization: Bearer <token>
Content-Type: application/json
```

**Request body**

```json
{
  "text": "string" (required),
  "lang": "rw | kj" (optional),
  "response_format": "mp3 | wav" (optional),
  "speed": 1.0 (optional),
  "gender": "female" (optional)
}
```

**Request constraints**

* Maximum text length: **5,000 characters**
* Supported languages:
  * `rw` (Kinyarwanda)
  * `kj` &#x20;
* Speed range: **0.25 – 4.0**
* Supported formats: **mp3**, **wav**
* Timeout: **60 seconds**

**cURL example**

```bash
curl -X POST "${BASE_URL}/platform/v1/voice/{subcompany_id}/tts" \
  -H "Authorization: Bearer <token>" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Muraho neza",
    "lang": "rw",
    "response_format": "mp3",
    "speed": 1.0,
    "gender": "female"
  }'
```

**Response**

The API returns a **Base64-encoded audio payload** representing the generated audio file.

The response is not a downloadable file and must be decoded on the client side.

***

**Decoding the response to MP3 (Windows)**

If you are using **Windows**, you can decode the Base64 response into an MP3 file using **PowerShell**.

1. Copy the Base64 response into a file (for example, `input.txt`)
2. Run the following command in PowerShell:

```powershell
$b64 = Get-Content input.txt -Raw
$b64 = $b64 -replace '\s',''
[IO.File]::WriteAllBytes("output.mp3", [Convert]::FromBase64String($b64))
```

This will generate an `output.mp3` file in the same directory.

***

#### Automatic Speech Recognition (ASR)

Convert spoken audio into text.

**POST**

```
${BASE_URL}/platform/v1/voice/{subcompany_id}/asr
```

**Headers**

```
Authorization: Bearer <token>
Content-Type: multipart/form-data
```

**Request body**

* `file` (required)

**Request constraints**

* Maximum file size: **15 MB**
* Supported formats: **mp3**, **wav**
* Timeout: **60 seconds**

**cURL example**

```bash
curl -X POST "${BASE_URL}/platform/v1/voice/{subcompany_id}/asr" \
  -H "Authorization: Bearer <token>" \
  -H "Content-Type: multipart/form-data" \
  -F "file=@audio.mp3"
```

**Response**

The API returns the **transcribed text** extracted from the uploaded audio file.

***

### Notes

* Both endpoints require a valid `{subcompany_id}`
* Authentication uses a **subcompany takeover token**
* TTS responses require **client-side decoding**
* This API currently supports **Kinyarwanda and Oshiwambo female voice**


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://documentation.proto.cx/docs/developer-documentation/developer-api/voice-api.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
