The use of effect size alternative to solely interpreting “p” value has been recommended repeatedly (Dunnette, 1966; Huberty, 1987; Rosenthal, 1992; Vaughan & Corballis, 1969). An indicator of the strength of association between two or more variables  (Independent and dependent) is called as effect size measure. Effect sizes are only “descriptive” statistics and not “inferential” inferential statistics (Chow, 1996). But it complements inferential statistics such as p value. The significance level is not determined by effect size or vice versa. Common statistics for estimating effect size are ω 2, η 2   and Cohen d.

The effect size concept already is found in everyday life. Example: A weight loss program claiming to help reducing 10 kg weight. 10 kg is the indicator of claimed effect size. Another example could be tutoring program. It may claim to produce a raise in the performance by one letter grade. In this case the letter grade is the claimed effect size.

Criticism of effect size: Favreau (1997) criticised that even when the effect size is calculated that researchers still don’t know whether all individuals differ substantially or moderately from those individuals in comparison group. Levin (1967) pointed out the problem in interpreting the variance shared among the variables. He noted an experiment as an example in which the shared variance was given as 37%, but in closer inspection it was found that around 85% of the shared variance came only from one of the six experimental groups. This examples illustrated the fact that one superior group may blanket over “success” over all other group and give an illusion of large effect. Dooling and Danks (1975) gave another critique of effect size. These researchers argued that psychology, due to the nature of its experimental design, is not ready to start adequately interpreting effect size. Howell (1989) noted that η is easy to calculate but often yields an overestimate of effect.

Kirk (1996) has found that only 12% of articles published in Journal of Experimental Psychology, Learning and Memory reported effect size measures when reporting statistical significance.




Chow, S. L. (1996). Statistical significance: Rationale, validity and utility. London: Sage Publications.


Dooling, D. J., & Danks, J. H. (1975). Going beyond tests of significance: Is psychology ready? Bulletin of the Psychonomic Society, 5, 15-17.


Dunnette, M. D. (1966). Fads, fashions, and folderol in psychology. American Psychologist, 21, 343-352.


Favreau, O. E. (1997). Sex and gender comparisons: Does null hypothesis testing create a false dichotomy? Feminism & Psychology, 7, 63-81.


Howell, D. C. (1989). Fundamental statistics for the behavioral sciences (2 nd edition). Boston: PWS-Kent Publishing Company.


Huberty, C. (1987). On statistical testing. Educational Researcher, 16, 4-9.


Kirk, R. E. (1996). Practical significance: A concept whose time has come. Educational and Psychological Measurement, 56, 746-759.


Levin, J. R. (1967). Misinterpreting the significance of “explained variation.” American Psychologist, 22, 675-676.


Rosenthal, R. (1992). Effect size estimation, significance testing, and the file-drawer problem. Journal of Parapsychology, 56, 57-58.


Vaughan, G. M. & Corballis, M. C. (1969). Beyond tests of significance: Estimating strength of effects in selected ANOVA designs. Psychological Bulletin, 72, 204-213.