Sunday, December 18, 2011

Multilevel Research Conceptual Problems: Wrong Level Fallacies

In a prior post on the definition of multi-level variables (here), I described two well known conceptual problems that can result from single-level analysis that uses aggregated and disaggregated data. The two problems, the atomistic fallacy and the ecological fallacy, will be discussed in more detail in this post. I will also discuss Simpson's paradox which is a similar fallacy that results from combining data.

The atomistic and ecological fallacies involve the same problem taken from different perspectives in the hierarchy. When you formulate inferences at a higher level based on data gathered at a lower level, you can commit the atomistic fallacy. For example, it cannot be assumed that the negative relationship between infant mortality and individual income is the same at the individual level as at the country level. High income inequality might mediate the effect at the aggregate level.

Formulating inferences at a lower level from data measured at a higher level can commit the ecological fallacy. The issue is particularly important in medicine where health care providers attempt to treat individual patients based on aggregate data about treatment effectiveness. Individuals have much more variability than aggregate populations and relationships at the aggregate level can be reversed from what is appropriate at the lower level. I'll discuss a specific example involving kidney stone treatment below, but the issue becomes clearer when we consider the details of Simpson's Paradox.
Simpson's paradox involves correlations present in different groups which are reversed when the groups are combined. Consider the graph presented above. In the subgroups there is a positive relationship between X and Y (for example, consider X height and Y income for two groups of males and females) . For the overall population (the dotted black line), however, the regression relationship is negative.

The reason for the paradoxical relationship is that there is a difference in the intercepts of the two groups. Not only are men taller as a group than women but they also have higher average income. The relationship at the aggregate level is based on the difference in intercepts in the regression equations from which the graphs were produced (see the note below explaining how the graphs were constructed).

A simplified approach to random coefficient models (another name for HLMs) was introduced by Gumpertz and Pantula (here). In their approach, coefficients are first estimated within groups and then analyzed at the aggregate level with the estimated coefficients as data. One simple analysis would be to average the coefficients from each of the groups. The resulting regression curve (see the note below) is plotted above in green as the aggregate curve. Done this way, Simpson's paradox does not emerge.

There are many real-world examples (here) of wrong level fallacies, all of which involve units of analysis within hierarchies:
  • Civil Rights Act of 1964 Overall, more Republicans voted for the Act than Democrats, contrary to expectation. Legislators, however, are nested not only within States and parties but also regions. Regional affiliation turned out to be determinant with more Southern Democrats voting against the legislation.
  • Kidney Stone Treatment Two treatments for Kidney stones (here) seemed to have similar rates of success. However, when the data were disaggregated by disease seriousness (the size of the kidney stones), one of the treatments produced better results. The problem here was that treatments were applied differentially based on case severity. This particular example of Simpson's paradox would typically be handled with a covariate for kidney stone size. The problem is to identify, with causal arguments, what that covariate should be. A multi-center clinical trial might have picked up the difference if case severity was not evenly distributed.
  • Berkeley Gender Bias Aggregate data from admissions at UC Berkeley found bias in admission of women (here). When the data were disaggregated by department (recognizing the hierarchical organization of departments within academic institutions), it was found that there was a slight bias in favor of admitting women.
  • Prenatal Care and Infant Survival Bishop et. al (1975, pp. 41-42, here) provide an example involving prenatal care and infant survival. The apparent association disappears when data are considered separately for each clinic involved. In other words, there was an overall association when the actual hierarchy (infants within clinics) was ignored.
Judea Pearl (here) has argued that fallacies such as Simpson's paradox cannot be resolved by statistical techniques. Pearl's argument, based on directed graphs, does not specifically included references to hierarchical models. Since HLMs are specifically designed to handle groups nested within higher order systems, HLMs hold out the possibility of providing the necessary theoretical rationale called for when identifying causal variables. I'll cover Pearl's argument and directed graph representations in a future post.

NOTE: The graphs above were produced in R and involve, in the first graph, three regression equations where the reversal is demonstrated (see the code here). The reversal disappears when group is included in the regression equation (see the R code here).


Multilevel Research: Definition of Variables


In a prior post (here), I described a typical hierarchical data structure of students nested within classrooms, nested within schools, nested within districts. Terminology has developed for describing types of variables at different levels in the hierarchy.

The typology in the graphic above was taken from Hox (2000, here) and Hox (1995, here) which was originally taken from Lazarsfield and Menzel (1961, here):
  • Global or absolute variables refer only to the level at which they are defined, e.g., student intelligence or gender.
  • Relational variables belong to one single level but describe relationships to other units of analysis at the same level. Sociometric status, for example, measures the extent to which someone is liked or disliked by their peers.
  • Analytical variables are constructed from variables at a lower level. School mean achievement, for example, would refer to the achievement levels of students measured at the lower level and aggregated upward. Statistics other than means can also be used, for example, standard deviations to measure heterogeneity.
  • Structural variables refer to the distribution of relational variables at the lower level. Social network indices (such as social capital) are one example.
  • Contextual variables refer to the higher-level units. All units in the lower level receive the value of the higher level variable referring to the "context" of the units being measured.
It should be clear that constructing analytic or structural variables involves aggregation and that contextual variables involve disaggregation. In the graphic above, aggregation is noted by the right facing arrow, ->, while disaggregation is denoted with a left facing arrow, <-.

The ability to aggregate or disaggregate variables would seem to suggest that variables from different levels could variously be aggregated and disaggregated to perform a straight-forward single-level analysis. Two problems arise, however, that help motivate the application of multi-level techniques:
  • Statistical Aggregated variables lose information in the aggregation process and result in lower statistical power at the higher level, that is, real significant results can fail to be identified (Type II Error). Disaggregated variables, on the other hand, tend to result in the identification of "significant" results that are totally spurious (Type I Error).
  • Conceptual If results based on aggregate data are not interpreted carefully, conclusions at the higher level can be spurious (wrong level fallacy). Formulating inferences at a higher level based on data gathered at a lower level can commit the atomistic fallacy. Formulating inferences at a lower level from data measured at a higher level can commit the ecological fallacy.
Conceptual problems (wrong level fallacies) will be discussed in more detail in a future post. For the present, it is important to note that multi-level techniques that preserve the hierarchical structure of the data (even with aggregated and disaggregated variables in the analysis) seek to avoid the conceptual errors that can be made when the entire analysis is performed with aggregated and disaggregated variables at one level. An important question is how well the HLM techniques protect against wrong level fallacies.

Friday, December 16, 2011

Hierarchical Linear Models, Overview


Hierarchical Linear Models (HLMs) offer specialized statistical approaches to data that are organized in a hierarchy. A typical example (above) is pupils nested within classes, nested within schools, nested within school districts. If your research explores the relationship between individuals and society, HLMs will be of interest.

A conventional analysis of a hierarchical data set might (1) ignore the hierarchy and just sample students or (2) include indicator or dummy variables for the class, schools and district within which lower levels are nested. The issues raised by HLMs involve understanding the effects of either including or failing to include hierarchical effects and how precisely the effects are included.

HLMs have been around under different names (mixed models, random effects models, variance components models, nested models, multi-level models, etc.) since the beginning of modern statistics but have only recently become popular in the 1990s. This is a little strange since it is hard to think of a unit of analysis one might study that is not nested within some hierarchy (think about patients, countries, firms, workers, etc.). It should also be pointed out that individuals are not always the lowest level of analysis; roles and repeated observations on individuals have also be used to define the lowest level.

Why have HLMs only recently become popular? I can offer three reasons: (1) The slow (very slow) diffusion of ideas from General Systems Theory (GST). Entire academic disciplines have developed around atomistic ideas describing their units of analysis (think of homo economicus in Economics). The ideas of GST have, understandably, met with a lot of resistance. And, let's face it, analyzing systems is more difficult than analyzing individuals and who said analyzing individuals was easy in any event. (2) The world is becoming more interconnected. In a globalized world, systems become more important and more determinant. And, (3) probably most importantly, software is now available in the major statistical packages (HLM, SAS, SPSS, STATA, and R).

This does not mean, however, that studies conducted using atomistic (pooled) analysis are wrong. If the hierarchy doesn't affect the unit of analysis you are studying, parsimony suggests that you ignore it. In other words, a case always has to be made that HLMs are appropriate to your sample. And, there is some evidence that misapplication of HLMs can obscure otherwise significant results.

Consider the simple two level linear model.



The lower case letters, a and b, are the first level ordinary least squares (OLS) parameters, the epsilon is the first-level error term, Y is the dependent variable and Z is the independent variable. At the second level, the gammas are the second-level parameters, W is the second-level dummy variable(s) and the mu's are the second level error terms. Notice that the first level parameters, a and b, are modeled as a function of the higher-level system effects.

Just to keep things simple, I'll assume that the higher-level systems only affect the intercept terms in the model.


After substitution, it's easy to see that we added a separate error term, mu. Adding more error can have the effect, if the added error is large, of obscuring otherwise significant effects.

Another way to see this is to look at the distributional assumptions.


For OLS, the assumption is that there is a single variance for all the normally distributed observations (the term on the left above should be read that the error terms are distributed normally with mean zero and the same variance for all observations where n is the number of observations and I is an identity matrix). For the weighted least squares model, there is an added matrix of weights, upper case sigma, that accounts for the different variances and covariances among the individual error terms. The variability in the variances is called heteroscedasticity and can be tested with Levene's test.

Basically, HLMs handle heteroscedasticity by estimating the error components and then dividing out the error with some form of Generalized Least Squares (GLS). However, if the variances are large compared to the estimated parameters (think t-statistics), significant effects can obscured.

In future posts, I'll go through some of the issues raised by HLMs in much more detail. Here's a partial list of the issues as I currently understand them:
  • Are HLMs enhancing theory or are they just methodologies for partitioning variance?
  • What is the sampling model underlying HLMs and what does it imply for statistical conclusions? Are we sampling at each level in the hierarchy or just the lowest level? If so, what are the consequences of sampling only at the lowest level?
  • What are the implications of analyses derived from multi-level research? For example, does hierarchical analysis imply, in the example above, that students could be transferred to different classes, different schools and different districts and have, for example, improved academic performance? What if the new school, for example, is not in the sample. How is it to be coded as a new dummy variable in the model?
  • What is the meaning of aggregated variables? For example, can we aggregate student intelligence scores to the school level and move the intelligence variable to a higher level? How do we interpret such aggregation?
  • Does statistical power analysis at the individual level cover HLMs or is some other approach needed (power analysis determines how large a sample is necessary to observe a significant result)?
  • There are many methods by which hierarchical analysis can be conducted, from simple to very complex. How well do these various approaches compare to each other and compare to a naive pooled analysis?
  • How powerful are the tests for heteroscedasticity, that is, can these tests be relied on to tell us when HLMs are appropriate?

In future posts I'll go through a presentation by Doug Bates, an expert on HLMs at UW Madison, from a workshop he gave at University of Lausanne in 2009 (here). It's an excellent presentation and it gives me the opportunity to explore the questions raised above.

Next summer, I might have an opportunity to teach HLMs at the University of Tennessee in Knoxville. A tentative description of the course is listed below. Blog postings will basically allow me to serialize my lecture notes.

STATISTICAL ANALYSIS OF HIERARCHICAL LINEAR MODELS

Hierarchical linear models (HLMs) allow researchers to locate units of analysis within hierarchical systems, for example, students within school systems, patients within treatment facilities, firms within industries, states within federal governments, countries within regions, regions within the world system, etc. In fact, it would be rare to find a unit of analysis that was not situated within some higher-order system. This does not mean that HLMs can be applied indiscriminantly. If higher-order systems are not contributing significant variation to your unit of analysis, HLMs can obscure otherwise signifiant effects. This course will take a critical look at HLMs using computer intensive techniques for evaluating alternative estimators. The course will cover both parametric and nonparametric estimators to include the classes of OLS, WLS and GLS estimators, tests for homogeneity of variance and normality, statistical power analysis, the EM algorithm, classes of ML estimators, the bootstrap and rank transformation (RT) models. As a course project, students will do either a comparative methodological study or analyze an existing hierarchical data set. The R statistical package will be used for in-class demonstrations and method studies. For data analysis, students can either work in R or other available packages with HLM capabilities (SAS, SPSS, HLM, Stata, etc.). A course in regression analysis and basic computer literacy are prerequisites. Bring your laptop computer to class and, if possible, have R installed on your machine. R is free and can be obtained from http://www.r-project.org/.