Friday, February 17, 2012

Peeking, Tweaking and Cross-Validation

The classical Null Hypothesis Testing (NHT) paradigm (see my prior posts on NHT here) is focused almost exclusively on the critical experiment. The critical experiment, for example the gold-standard clinical trial, is set up to insure that a hypothesis is given a definitive test. Does administration of a drug actually produce a significant reduction in the target disease. Competing explanations are controlled through random sampling and random assignment. Subjects are chosen randomly from some population and are assigned randomly to treatment and control. There may actually be multiple control groups, some receiving placebos, some receiving competitor drugs and others receiving nothing. For this purpose, classical NHT is well suited, if not not very cost effective.

In the everyday world of science, however, classical NHT is easy to teach but hard to implement. Research, by its very definition, is mostly exploratory. Any interesting scientific question starts as a somewhat poorly formed idea. Concepts are a little unclear. Measurement of the concepts are open to question. Causal processes may be obscure and even unobservable. There needs to be a period of exploration, development and more casual testing. A lot of this period will involve exploratory data analysis.

Unfortunately, NHT doesn't really allow data exploration. Significance tests and associated probability levels are based on the assumption that you have not peeked at your data. The probability levels also assume that you have not estimated your model once, found it inadequate and gone on to tweak the model until it fits your sample data. In other words, NHT assumes that you are not involved in post-hoc analysis. Is there a solution to this dilemma?

In preparation for a research project I'm currently working on, I started re-reading the literature on Path Analysis. I haven't worked in this area since the 1980's, so I started back there and began reading forward. I ran into a 1983 article by Norman Cliff titled Some Cautions Concerning the Application of Causal Modeling Methods. In the article, a number of interesting general points are made that make the article seem quite current.

To the problem of peeking and tweaking, a strong recommendation is made to use cross-validation. In cross-validation, a large sample is drawn, larger than is needed for adequate statistical power. Part of the sample is then sub-sampled and used for exploratory model development. The remaining part of the sample is used to test the tweaked model. Since one has not peeked at this data, there is the possibility of fairly testing the validity of the model, that is, cross validating the model.

The project I am currently working on involves studying Globalization in Latin America. Multi-country analysis provides another type of cross-validation. In this case, the principal investigator has extensive expertise on a few countries, the greatest expertise being in Mexico. Since a lot of his understanding of the research problem is based on experience in Mexico, one approach would be to use Mexico as the model development laboratory. Another approach would be to take some other country, say Argentina, and develop there using ones understanding of Mexico to guide model development. This approach could possibly produce a somewhat more general model.

In a future post, I will talk about how cross-validation worked in this specific context. I will also talk about the issue of statistical power analysis and sample size determination if a cross-validation is being attempted. There are also a number of other interesting points in the Cliff(1983) paper, particularly on causality, that would be worth discussion.