Philosophy of measurement, data analysis, and interpretation

Philosophy of measurement, data analysis, and interpretation

This page documents our (evolving) approach to measurement, data analysis, and interpretation.

Measurement

We aspire to make invariant measurement across sites, recognizing that this goal will not always be met. Our goal is to be able to achieve scalar and/or metric invariance for some measures, enabling us to both compute latent construct scores for individuals using the same models across sites and use the same parameters for adaptive testing across sites. However, this is a challenging task to accomplish in diverse settings. We will thus conduct stringent tests to determine whether this goal has been met. For both child and caregiver measures, we will test for measurement invariance across sites by fitting multi-group models. We will also assess individual items for differential item function across sites.

Generalization, causal inference, and interpretation

Notwithstanding the high degree of measurement invariance we aspire to, we still must be very cautious about the generalizations we make across sites.

From the LEVANTE paper:

Given the design of LEVANTE, direct comparisons of outcomes between sites are not appropriate. Myriad differences in participant sampling and recruitment as well as in the particular circumstances in which measures are administered confound any inference about differences between sites. For example, while participants in one pilot site were recruited through their schools and tested on tablets in their classroom, participants in another volunteered to participate through an online database and were tested on a heterogeneous mix of computers and tablets in their homes.

We may be tempted to compare site 1 and site 2 on some measure and infer that there is something general about the site that caused the difference in that measure (e.g., “children in site X are better at math than children in site Y”). But we know that this inference is confounded by differences in sampling, recruitment, and administration across the sites, so comparisons of measure intercepts across sites in LEVANTE is not warranted.

We are also interested in comparing other analyses across sites. For example, perhaps we are interested in the correlation between math and home environments. We can extract this correlation within each site and compare its magnitude. Our view is that these comparisons can be interesting, but we must be clear about the kinds of confounds that still exist. The correlation is less obviously confounded by differences in task administration across sites, but might still be affected by sampling, or aspects of our measurement (for example, a measure that is at ceiling). Thus we still must be very cautious about making causal inferences in this case.