Internationalization
Current available internationalizations of the LEVANTE tasks
- English (US) ๐บ๐ธ
- Spanish (CO) ๐จ๐ด
- German (DE) ๐ฉ๐ช
General internationalization workflow
Adapting the LEVANTE tasks to a new language is a big job! If you are a team interested in working with the Data Coordinating Center (DCC) at Stanford to internationalize a task, please make sure that you budget sufficient staff time to do a lot of iterative review, testing, and translation checking (see below). Below you will find some of the processes we have developed.
For non-language tasks, the workflow is as follows:
- AI translation (items + instructions)
- Hand checking by contractor
- Checking/vetting by site staff
- AI generation of audio assets
For language and reading tasks, adaptation is more challenging, requiring additional:
- Task-specific item creation
- Grammar: Addition/removal of relevant items with help from a consultant expert in syntactic acquisition for the relevant language
- Phoneme awareness: Creation of materials for the sounds of the language (for alphabetic languages) with help from a consultant expert in reading development for the relevant language
- Item checking and updating
- Item review by language specialists with expertise in the new language
- Creation of new illustrations
- Validation
- Differential item function checks with existing task versions
- Within-site reliability checking
- Configural invariance checking with other language tasks
- Potential validation with other measures, e.g. standardized tasks
- Item bank norming
- Sentence understanding: Collection of enough data on new items to allow the creation and deployment of an adaptive test with language specific parameters
The challenges involved in this work will vary by language / dialect. For example, adapting to European languages with relatively similar grammar and morphology to those already supported will likely be easier than adapting to a language with a non-alphabetic writing system.
Budgets for validation and item-bank norming likely depend on whether the site doing this work would like to use their first wave of data collection for measure refinement. This will likely lead to slightly suboptimal measures for the first wave. If this is acceptable, then costs will be relatively low. In contrast, if the site would like to validate the newly internationalized measures prior to deployment, then costs will be higher due to the requirement of collecting independent data on the adapted measures.
Detailed internationalization processes
The steps we take are:
- Automatically translate the items in the LEVANTE-item-bank-translations V2 (access restricted to collaborators) spreadsheet using DeepL.
- Request a native speaker to review, comment on, and suggest changes to the translations. Ideally, this should be both a trained translator (first), and a member of the collaborating site research team (second). It is most helpful to review the tasks in English before and during the translation process for context and imagery. Further task-specific translation issues are listed below.
- Once the translations are finalized, generate new task audio: see Voice Generation.
Task-specific Translation/Adaptation
Sentence Understanding
Adapting the Sentence Understanding task (which test grammatical abilities) for other languages is a challenge. The English sentences are not the simplest, most colloquial way of phrasing the utterances โ thus, the translations should also not be. Instead, as this task is meant to measure the comprehension of particular grammatical structures, the translations should try to use the same grammar (even if somewhat awkward) as was used in the original English. If there is no equivalent structure, stick to the meaning. Of course, we expect that the difficulty of each item will not be exactly the same in each language, but the goal is to approximately match, rather than give the simplest way of phrasing.
The construct map (list of sub-constructs, e.g. subject-verb agreement, multiple object sentences, clausal conjunctions) for English is not expected to be the same for other languages. Thus, this task in particular requires the generation and norming of new items.
Vocabulary
A few items have been problematic in our translations (so far); these require replacement with new items. We are still in the process of determining how variably the vocabulary instrument performs across languages (and hence how detailed the process of adaptation should be).
We expect that in some cultural contexts, the pictures (and in some cases the words), used in the assessment will need to be updated, even for languages for which we already have
Reading Tasks (Phoneme Awareness, Single Word Reading, and Sentence Reading Efficiency)
The reading assessments offered by LEVANTE are versions of the ROAR tasks. This means that they must be internationalized in collaboration with ROAR. Typically, these tasks (especially Phoneme Awareness) require significant expertise in reading development for the particular language/culture being studied. Our process for technical collaboration with the ROAR team is found here.
Budget Guidance for Sites Considering Internationalization
If your site is considering a new adaptation of the LEVANTE measures, the DCC will be glad to support sites by doing first-pass translations, making modifications to measures per site staff guidance, and collaborating on testing these implementations. Sites are responsible for the final content of adaptations, however,ย since the DCC does not have expertise in your language or culture.
Please ensure that your site budget includes:
- Support for a staff member on your team to manage translation checking and iterative feedback on task materials and instructions. Assume this is at least a couple of months of part-time work.
- Support for expert consultants that you employ to help with checking the grammar and reading tasks. The amount of time this will take likely varies widely language to language.
- If you want to collect a separate validation sample (as opposed to using unvalidated measures for the first wave of your main study and then adjusting on the fly), sufficient budget to collect data on the language measures from several hundred children (say ~200 to name a ballpark number) in order to detect issues in the materials and gather item norms for adaptive testing.