Saturday, January 28, 2012

How Complicated Models Broke the NHT Paradigm

In a prior post (here) I pointed out the weaknesses of the Null Hypothesis Testing (NHT) paradigm using a simple data set on automobile mileage (here). What I demonstrated was that if you arrive at the end of the NHT process (described here) with an insignificant result, you can always try to increase the sample size (if this is possible) until you can detect as small a difference as you observed. The ability to produce significant results based on manipulation of effect size and sample size should make everyone a little uneasy about the body of scientific work that has been generated under the NHT paradigm. In this post, I will return to the beginning of the NHT process and ask what types of theories generate the research hypothesis on which NHT is based. The conclusion here is that if the theories always generate "injection models," then all the weaknesses of NHT apply. However, if the theories generate multiple models, rather than research and null hypotheses, there is a way out of the NHT dead end.

At first, it might seem that there could not possibly be a uniform class of theories that generate research hypotheses given the wide breadth and depth of scientific theories in current use. If we try to classify the type of theories most used in classical NHT by the types of models involved, however, there is a general class of models called "injection models" that are well suited to the NHT paradigm.

By a model I mean a structural equation model and its associated directed graph. For example, the model underlying the research hypothesis about automobile miles per gallon (MPG) is displayed in the directed path above. The independent variable we focused on was the type of transmission, but obviously there are a lot of other factors that go into the determination of MPG. These types of models are always easily described by a general linear model and can often be tested by simple linear regression. They are all "injection models" because the effects of the independent variables are "injected" into the dependent variable. The null hypothesis is always that there is no injection.


There is a problem, however, with the causal implications of regression models expressed in mathematical terms. The first equation above is the standard regression equation. If y = MPG and x = [0,1] depending on whether we do or do not have an automatic transmission, then we have an injection model for automobile mileage. Mathematical objects, however, are symmetric and nothing prevents us from re-expressing the model to predict the type of transmission from MPG. While predicting the type of transmission from MPG might be an interesting exercise, we know a priori that mileage doesn't "cause" a car to have a particular type of transmission. We need something beyond mathematics to describe causal relationships.

Causal diagrams make the direction of causality clear from the direction of the arrow connecting two variables. The importance of path diagrams for encoding causal information was discovered by geneticist Sewell Wright in the 1920's. Wright retired to the University of Wisconsin in 1955 where he had an influence on economists and sociologists.
With path diagrams it was relatively easy to develop complicated causal models that do not lend themselves to re-statement as a simple null hypothesis. For example, in the path diagram above (from Pedhazur and Kerlinger, 1982 and Karl Wuensch, here), socioeconomic status (SES) is taken as an exogenous variable that influences IQ, need achievement (nAch) and resulting grade point average (GPA). The variables that are the result of causal arrows (IQ, nAch and GPA) are considered endogenous. It is important to this theory of grade point determination to note that IQ and nAch are mediating influences on SES. That is, you will find lower SES students with high GPAs as a result of their IQ and their need for achievement. You will also find some high SES students with neither high IQ nor high need achievement, but they will probably still have better GPAs than low SES students.

In the NHT paradigm, path models have to be converted into injection models as in the graphic above. In the "testable" NHT model, SES, IQ and nAch are all considered exogenous variables determining grade point. The simple null hypothesis would be that none of these factors have a statistically significant impact on GPA. The idea of mediating or intervening variables (IQ and nAch mediating the influence of SES on grade point) is dropped. The reformulated model above, however, is still not enough for hard-core experimentalists.

For the hard-core experimentalists, Pedhzur and Kerlinger have to give up any ideas about drawing causal links from SES and IQ to GPA because these variables cannot be experimentally manipulated. It would not be possible to conduct an Eliza Doolittle experiment where subjects were randomly assigned to have their SES manipulated so changes in GPA could be observed. It would also not be practical to manipulate IQ. Since it might be possible to manipulate nAch (annotated as do_nAch using Judea Pearl's terminology) what the experimentalists require is that all extraneous factors be eliminated through random assignment. The fact that we are no longer testing Pedhazur and Kerlinger's theory is not an issue because the theory is not really testable within the NHT paradigm.

Unfortunately, there are an infinite number of important research questions where experimentation is neither possible nor desirable, global climate change being just one example (we simply cannot manipulate the world system). Instead, we are forced to study intact populations or intact systems using models. Path diagrams provide a useful and general (that is, nonparametric) way for describing causality based on an understanding of the system being studied. Contending models always exist in various stages of testing. NHT is well suited for handling the critical experiment where a hypothesis can be directly tested. Even here we never really test the research hypothesis, only the null alternative. The focus of the NHT statistician is on hypotheses while the focus of scientists is on models.

Returning to Pedhazur and Kerlinger's path model, not only did the absence of experimental design generate howls of protest from the experimental and statistical scolds (here and here, basically "correlation cannot prove causation") but the approach to estimating path diagrams was also held up for criticism because technical statistical requirements (the assumptions of classical normal distribution theory) were being violated. I'll get into these issues in a future post. For now, what is important to understand is that multiple, competing causal models (rather than hypotheses or statistical distributions) are the important parts of science that always exist in various states of confirmation and acceptance. Statistical technique needs to be able to test models in addition to merely testing hypotheses. Multi-model development, testing and selection provides one way out of the NHT dead end.

No comments:

Post a Comment