Somewhere in the “The Boscombe Valley Mystery” Sherlock Holmes says “There is nothing more deceptive than an obvious fact”. Well, actually, I would rephrase the quotation as “There is nothing more deceptive than statistics”.

I was reading Daniel Kahneman’s awesome book “Thinking: Fast and Slow” (which I strongly recommend), when I found the following paragraph:

*Researchers measure the strength of relationships by a correlation coefficient, which varies between 0 and 1. The coefficient was defined earlier (in relation to regression to the mean) by the extent to which two measures are determined by shared factors. A very generous estimate of the correlation between the success of the firm and the quality of its CEO might be as high as .30, indicating 30% overlap* (p. 205).

Now, one of my eyebrows raised after reading the first sentence. Anybody who has ever had the dubious privilege of teaching statistics in psychology programs knows that one of the first things that students learn about the correlation coefficient is that it can range from -1 to 1 – sadly, sometimes it is its only property that they can list when asked. But that’s ok, if we’re speaking about the strength of the relationship the sign is not an issue, so let’s get over it. But then I read that if the correlation between the variables is .30, there is 30% overlap. I might have been misinterpreting the whole passage, but when speaking of “shared factors” and “overlapping” I can’t help thinking about “shared variance”, and shared variance is definetely *not* indexed by the correlation coefficient. Instead, it’s the coefficient of determination, which is the squared correlation coefficient (which, incidentally, ranges from 0 to 1). So, if the correlation between two measures is .30, the shared variance is .09, i.e., 9%.

Some time ago I pointed out the same issue to a journalist, and, ironically, in the same book Kahneman (which is only a living legend of psychology, a Princeton emeritus professor and a Noble Prize laureate) tells about his research about the ways in which statistics can be misunderstood and provides very nice explations of why this happens.