Voice Generation

LEVANTE tasks are designed to be adapted across a wide variety of languages and dialects, and to undergo rapid iteration during the adaptation process. For these reasons, it is not possible to record and edit audio assets using voice actors. In addition to being cost- and time-prohibitive, it is very hard to update these assets when task items or instructions change. Thus, all speech for all LEVANTE tasks is generated via AI audio generation software.

As of late 2024 we are still iterating on this process and we expect that audio assets will improve as we identify better vendors and improve our pipeline.

Details

This document explains how we generate audio for the LEVANTE tasks. Note, this is a technical document intended in part for devs working on LEVANTE.

The broad steps:

Ensure translations are complete and correct.
Select a voice on play.ht, which is our current AI voice provider. Then generate a sample for each voice in the new language, and ask native speakers (ideally, LEVANTE site researchers) to evaluate which voice has the best pronunciation, sounds natural, and hopefully somewhat child-friendly (enthusiastic, not too fast – although speed can be modified in the API).
Use parallel_download to generate audio for the new language, using the selected voice. File names are taken from the item_id column, and audio is generated from the specified language column (e.g., de = German).
Add the new audio files to the appropriate directory in the LEVANTE core-tasks repo. For example, German memory game audio should be put in core-tasks/assets/memory-game/de/shared/
Upload the new audio files to the appropriate Google Cloud bucket, in the appropriate language directory. (E.g., for the memory task: https://console.cloud.google.com/storage/browser/memory-game-levante )

Below we document which voices have been selected for each language, and give examples of how to call.

Voice Choices

English: python playDotHt_v1.py translation-items.csv 'en' 'en-US-AriaNeural'

Spanish (Colombia): python playDotHt_v1.py translation-items.csv 'es-co' es-CO-SalomeNeural

German: python playDotHt_v1.py translation-items.csv 'de' 'VickiNeural'