The use of effect size alternative to solely interpreting “p” value has been recommended repeatedly (Dunnette, 1966; Huberty, 1987; Rosenthal, 1992; Vaughan & Corballis, 1969). An indicator of the strength of association between two or more variables (Independent and dependent) is called as effect size measure. Effect sizes are only “descriptive” statistics and not “inferential” inferential statistics (Chow, 1996). But it complements inferential statistics such as p value. The significance level is not determined by effect size or vice versa. Common statistics for estimating effect size are ω ^{2}, η ^{2 }and Cohen d.

The effect size concept already is found in everyday life. Example: A weight loss program claiming to help reducing 10 kg weight. 10 kg is the indicator of claimed effect size. Another example could be tutoring program. It may claim to produce a raise in the performance by one letter grade. In this case the letter grade is the claimed effect size.

Criticism of effect size: Favreau (1997) criticised that even when the effect size is calculated that researchers still don’t know whether all individuals differ substantially or moderately from those individuals in comparison group. Levin (1967) pointed out the problem in interpreting the variance shared among the variables. He noted an experiment as an example in which the shared variance was given as 37%, but in closer inspection it was found that around 85% of the shared variance came only from one of the six experimental groups. This examples illustrated the fact that one superior group may blanket over “success” over all other group and give an illusion of large effect. Dooling and Danks (1975) gave another critique of effect size. These researchers argued that psychology, due to the nature of its experimental design, is not ready to start adequately interpreting effect size. Howell (1989) noted that η ^{2 }is easy to calculate but often yields an overestimate of effect.

Kirk (1996) has found that only 12% of articles published in Journal of Experimental Psychology, Learning and Memory reported effect size measures when reporting statistical significance.

References

Chow, S. L. (1996). *Statistical significance: Rationale, validity and utility*. London: Sage Publications.

Dooling, D. J., & Danks, J. H. (1975). Going beyond tests of significance: Is psychology ready? *Bulletin of the Psychonomic Society*, *5*, 15-17.

Dunnette, M. D. (1966). Fads, fashions, and folderol in psychology. *American Psychologist*, *21*, 343-352.

Favreau, O. E. (1997). Sex and gender comparisons: Does null hypothesis testing create a false dichotomy? *Feminism & Psychology*, *7*, 63-81.

Howell, D. C. (1989). *Fundamental statistics for the behavioral sciences* (2 ^{nd} edition). Boston: PWS-Kent Publishing Company.

Huberty, C. (1987). On statistical testing. *Educational Researcher*, *16*, 4-9.

Kirk, R. E. (1996). Practical significance: A concept whose time has come. *Educational and Psychological Measurement*, *56*, 746-759.

Levin, J. R. (1967). Misinterpreting the significance of “explained variation.” *American Psychologist*, *22*, 675-676.

Rosenthal, R. (1992). Effect size estimation, significance testing, and the file-drawer problem. *Journal of Parapsychology*, *56*, 57-58.

Vaughan, G. M. & Corballis, M. C. (1969). Beyond tests of significance: Estimating strength of effects in selected ANOVA designs. *Psychological Bulletin*, *72*, 204-213.

Good blog this week 🙂 I just wondered if you had considered the question of whether we should report effect size in research? As you say effect size has its good points. Cohen (1990)* said “the primary product of research inquiry is one or more measures of effect size, not p values.” And to an extent I agree with this statement. P values can be misleading in some cases. I’m sure form stats and small groups we now know that as the number of participants in a sample goes up our p-value will decrease and vice versa. Lang, Rothman and Cann (1998)** concluded that any information we gain from a p-value is ambiguous as it relies on both the sample size and also the underlying effect.

……Effect size held constant = Bigger sample = significance is more likely…….

It has been found in reviews, not just in psychology, of things such as mathematics research (Slavin, Lake & Groff, 2007)*** as well as medicine (Sterne, Gavaghan & Egger, 2000)**** that studies with smaller sample sizes have larger effect sizes, as mentioned above, than larger sample sizes.

I guess the real question is whether we should use effect sizes to make judgement about effects in the real world? And the simple answer is no. P-values are all well and good at showing the direction of the effect but we don’t know how big the effect we have found is.

I guess this is why it is becoming increasingly common practice in many areas of research to report effect size. Many journals etc are beginning to request that effect size is reported to give us a better understanding of our findings.

*http://www.indiana.edu/~stigtsts/quotsagn.html

**Lang, J.M., Rothman, K. J. & Cann, C. I. (1998), “That confounded p-value”, Epidemiology

***Slavin, R.E., Lake, C., & Groff, C. (2007). Effective programs in middle and high school math:

A best evidence synthesis.

****Sterne J., Gavaghan, D., & Egger, M. (2000). Publication and related bias in meta-analysis: Power of statistical tests and prevalence in literature.