Random Variation: Multilevel Research: Definition of Variables

Sunday, December 18, 2011

Multilevel Research: Definition of Variables

In a prior post (here), I described a typical hierarchical data structure of students nested within classrooms, nested within schools, nested within districts. Terminology has developed for describing types of variables at different levels in the hierarchy.

The typology in the graphic above was taken from Hox (2000, here) and Hox (1995, here) which was originally taken from Lazarsfield and Menzel (1961, here):

Global or absolute variables refer only to the level at which they are defined, e.g., student intelligence or gender.
Relational variables belong to one single level but describe relationships to other units of analysis at the same level. Sociometric status, for example, measures the extent to which someone is liked or disliked by their peers.
Analytical variables are constructed from variables at a lower level. School mean achievement, for example, would refer to the achievement levels of students measured at the lower level and aggregated upward. Statistics other than means can also be used, for example, standard deviations to measure heterogeneity.
Structural variables refer to the distribution of relational variables at the lower level. Social network indices (such as social capital) are one example.
Contextual variables refer to the higher-level units. All units in the lower level receive the value of the higher level variable referring to the "context" of the units being measured.

It should be clear that constructing analytic or structural variables involves aggregation and that contextual variables involve disaggregation. In the graphic above, aggregation is noted by the right facing arrow, ->, while disaggregation is denoted with a left facing arrow, <-.

The ability to aggregate or disaggregate variables would seem to suggest that variables from different levels could variously be aggregated and disaggregated to perform a straight-forward single-level analysis. Two problems arise, however, that help motivate the application of multi-level techniques:

Statistical Aggregated variables lose information in the aggregation process and result in lower statistical power at the higher level, that is, real significant results can fail to be identified (Type II Error). Disaggregated variables, on the other hand, tend to result in the identification of "significant" results that are totally spurious (Type I Error).
Conceptual If results based on aggregate data are not interpreted carefully, conclusions at the higher level can be spurious (wrong level fallacy). Formulating inferences at a higher level based on data gathered at a lower level can commit the atomistic fallacy. Formulating inferences at a lower level from data measured at a higher level can commit the ecological fallacy.

Conceptual problems (wrong level fallacies) will be discussed in more detail in a future post. For the present, it is important to note that multi-level techniques that preserve the hierarchical structure of the data (even with aggregated and disaggregated variables in the analysis) seek to avoid the conceptual errors that can be made when the entire analysis is performed with aggregated and disaggregated variables at one level. An important question is how well the HLM techniques protect against wrong level fallacies.

Random Variation

Sunday, December 18, 2011

Multilevel Research: Definition of Variables

No comments:

Post a Comment