LEVANTE / ROAR internationalization process

Internationalization

Current available internationalizations of the LEVANTE tasks

English (US) 🇺🇸
Spanish (CO) 🇨🇴
German (DE) 🇩🇪

General internationalization workflow

Adapting the LEVANTE tasks to a new language is a big job! If you are a team interested in working with the Data Coordinating Center (DCC) at Stanford to internationalize a task, please make sure that you budget sufficient staff time to do a lot of iterative review, testing, and translation checking (see below). Below you will find some of the processes we have developed.

For non-language tasks, the workflow is as follows:

AI translation (items + instructions)
Hand checking by contractor
Checking/vetting by site staff
AI generation of audio assets

For language and reading tasks, adaptation is more challenging, requiring additional:

Task-specific item creation

Grammar (Sentence Understanding): Addition/removal of relevant items with help from a consultant expert in syntactic acquisition for the relevant language
Phoneme awareness (Language Sounds): Creation of materials for the sounds of the language (for alphabetic languages) with help from a consultant expert in reading development for the relevant language

Item checking and updating

Item review by language specialists with expertise in the new language
Creation of new illustrations

Validation

Differential item function checks with existing task versions
Within-site reliability checking
Configural invariance checking with other language tasks
Potential validation with other measures, e.g. standardized tasks

Item bank norming

Sentence Understanding: Collection of enough data on new items to allow the creation and deployment of an adaptive test with language specific parameters

The challenges involved in this work will vary by language / dialect. For example, adapting to European languages with relatively similar grammar and morphology to those already supported will likely be easier than adapting to a language with a non-alphabetic writing system.

Budgets for validation and item-bank norming likely depend on whether the site doing this work would like to use their first wave of data collection for measure refinement. This will likely lead to slightly suboptimal measures for the first wave. If this is acceptable, then costs will be relatively low. In contrast, if the site would like to validate the newly internationalized measures prior to deployment, then costs will be higher due to the requirement of collecting independent data on the adapted measures.

Detailed internationalization processes

The steps we take are:

Automatically translate the items in the LEVANTE-item-bank-translations V2 (access restricted to collaborators) spreadsheet using DeepL.
Request a native speaker to review, comment on, and suggest changes to the translations. Ideally, this should be both a trained translator (first), and a member of the collaborating site research team (second). It is most helpful to review the tasks in English before and during the translation process for context and imagery. Further task-specific translation considerations are listed below.
Once the translations are finalized, generate new task audio: see Voice Generation.

Task-specific Translation/Adaptation

Sentence Understanding

Adapting the Sentence Understanding task (which tests grammatical abilities) for other languages is more challenging. The English sentences are not the simplest, most colloquial way of phrasing the utterances — thus, the translations should also not be. Instead, as this task is meant to measure the comprehension of particular grammatical structures, the translations should try to use the same grammar (even if somewhat awkward) as was used in the original English. If there is no equivalent structure, stick to the meaning. Of course, we expect that the difficulty of each item will not be exactly the same in each language, but the goal is to approximately match, rather than give the simplest way of phrasing.

The construct map (list of sub-constructs, e.g. subject-verb agreement, multiple object sentences, clausal conjunctions) for English is not expected to be the same for other languages. Thus, this task in particular requires the generation and norming of new items.

Vocabulary

A few items have been problematic in our translations (so far); these require replacement with new items. We are still in the process of determining how variably the vocabulary instrument performs across languages (and hence how detailed the process of adaptation should be).

We expect that in some cultural contexts, the pictures (and in some cases the words), used in the assessment will need to be updated, even for languages for which we already have assets.

Reading Tasks (Language Sounds, Word Reading, and Sentence Reading)

The reading assessments offered by LEVANTE are versions of the ROAR tasks. This means that they must be internationalized in collaboration with ROAR. Typically, these tasks (especially Language Sounds; aka Phoneme Awareness) require significant expertise in reading development for the particular language/culture being studied. Our process for technical collaboration with the ROAR team is found here.

Budget Guidance for Sites Considering Internationalization

If your site is considering a new adaptation of the LEVANTE measures, the DCC will be glad to support sites by doing first-pass translations, making modifications to measures per site staff guidance, and collaborating on testing these implementations. Sites are responsible for the final content of adaptations, however, since the DCC does not have expertise in your language or culture.

Please ensure that your site budget includes:

Support for a staff member on your team to manage translation checking and iterative feedback on task materials and instructions. Assume this is at least a couple of months of part-time work.
Support for expert consultants that you employ to help with checking the grammar and reading tasks. The amount of time this will take likely varies widely language to language.
If you want to collect a separate validation sample (as opposed to using unvalidated measures for the first wave of your main study and then adjusting on the fly), sufficient budget to collect data on the language measures from several hundred children (say ~200 to name a ballpark number) in order to detect issues in the materials and gather item norms for adaptive testing.