# Voice API

This document describes the APIs for voice processing, covering **Automatic Speech Recognition (ASR)** and **Text-to-Speech (TTS)** services.

***

### Authentication

Each request to the developer API must include a bearer token in the **Authorization** header. This bearer token should be in the form of **Authorization: Bearer {subcompany\_takeover\_secret}**. This takeover secret can be found in the [teamspace](https://documentation.proto.cx/docs/settings/teamspaces) settings page.

```
Authorization: Bearer <token>
```

***

### Voice API

#### Text to Speech (TTS)

Convert text into spoken audio.

**POST**

```
${BASE_URL}/platform/v1/voice/{subcompany_id}/tts
```

**Headers**

```
Authorization: Bearer <token>
Content-Type: application/json
```

**Request body**

```json
{
  "text": "string" (required),
  "lang": "rw | kj" (optional),
  "response_format": "mp3 | wav" (optional),
  "speed": 1.0 (optional),
  "gender": "female" (optional)
}
```

**Request constraints**

* Maximum text length: **5,000 characters**
* Supported languages:
  * `rw` (Kinyarwanda)
  * `kj` &#x20;
* Speed range: **0.25 – 4.0**
* Supported formats: **mp3**, **wav**
* Timeout: **60 seconds**

**cURL example**

```bash
curl -X POST "${BASE_URL}/platform/v1/voice/{subcompany_id}/tts" \
  -H "Authorization: Bearer <token>" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Muraho neza",
    "lang": "rw",
    "response_format": "mp3",
    "speed": 1.0,
    "gender": "female"
  }'
```

**Response**

The API returns a **Base64-encoded audio payload** representing the generated audio file.

The response is not a downloadable file and must be decoded on the client side.

***

**Decoding the response to MP3 (Windows)**

If you are using **Windows**, you can decode the Base64 response into an MP3 file using **PowerShell**.

1. Copy the Base64 response into a file (for example, `input.txt`)
2. Run the following command in PowerShell:

```powershell
$b64 = Get-Content input.txt -Raw
$b64 = $b64 -replace '\s',''
[IO.File]::WriteAllBytes("output.mp3", [Convert]::FromBase64String($b64))
```

This will generate an `output.mp3` file in the same directory.

***

#### Automatic Speech Recognition (ASR)

Convert spoken audio into text.

**POST**

```
${BASE_URL}/platform/v1/voice/{subcompany_id}/asr
```

**Headers**

```
Authorization: Bearer <token>
Content-Type: multipart/form-data
```

**Request body**

* `file` (required)

**Request constraints**

* Maximum file size: **15 MB**
* Supported formats: **mp3**, **wav**
* Timeout: **60 seconds**

**cURL example**

```bash
curl -X POST "${BASE_URL}/platform/v1/voice/{subcompany_id}/asr" \
  -H "Authorization: Bearer <token>" \
  -H "Content-Type: multipart/form-data" \
  -F "file=@audio.mp3"
```

**Response**

The API returns the **transcribed text** extracted from the uploaded audio file.

***

### Notes

* Both endpoints require a valid `{subcompany_id}`
* Authentication uses a **subcompany takeover token**
* TTS responses require **client-side decoding**
* This API currently supports **Kinyarwanda and Oshiwambo female voice**
