Understanding Covariance and Correlation | CFA Level I Quantitative Methods
Welcome back! In this introduction to covariance and correlation, we’ll look at their significance and how to calculate them. You’ll encounter these concepts again later in the lesson on portfolio return, so let’s jump right in.
Sample Mean and Sample Standard Deviation Revision
Let’s start with a quick revision on sample mean and sample standard deviation. Consider the quarterly returns of stock X for the past year. The mean quarterly return is 5%, and by applying the sample variance formula and taking the square root of the result, we get a standard deviation of sx = 6.22%.
Now let’s calculate the sample standard deviation for stock Y, given its quarterly returns. The mean return is 6%, and the sum of squared deviations is 219.94. Applying the sample standard deviation formula, you should get sy = 8.56%.
Measuring Correlation between Two Variables
As an analyst, you’ll often need to determine if two different variables are correlated. Correlation is a measure of the linear relationship between two variables. The first step is to consider how the two variables move together, measured by the sample covariance. From our example, we calculate a covariance of 52.29.
Sample covariance sxy = ∑(X-X̅)(Y-Y̅)/(n-1)
However, the size of the covariance measure is difficult to interpret as it’s not normalized. This brings us to the normalized version of covariance, the correlation coefficient. We calculate this by dividing the covariance by the product of the standard deviations of the two variables, giving us a correlation coefficient of 0.98.
Correlation coefficient rxy = sxy/ (sxsy)
Interpreting Correlation
Correlation ranges from -1 to 1:
- -1 indicates a perfect inverse linear relationship
- 0 indicates no linear relationship
- 1 indicates a perfect linear relationship
In our example, a correlation of 0.98 indicates a strong positive correlation between the quarterly returns of stock X and stock Y.
Using Scatter Plots to Visualize Correlation
A scatter plot is a useful tool for visualizing the relationship between two variables. For instance, if we plot the data from our example, we can observe a strong positive correlation between the two variables.
Limitations of Correlation Analysis
While correlation analysis is an essential tool for understanding the relationships between variables, it has its limitations that should be considered. Some of these limitations include:
- Sensitivity to outliers: Correlation may be quite sensitive to outliers. Extreme outliers in the data can cause the correlation to drop to an insignificant value, leading the analyst to wrongly conclude that there is no meaningful correlation between the data. As we’ve learned earlier, outliers in data can be dealt with techniques like trimming or windsorisation. However, it may not be that straightforward. Analysts should first assess if the outliers provide information or are merely noise in the data.
- Spurious correlation: This refers to correlations that are a result of chance, rather than a genuine relationship between the variables. For example, the correlation between monthly US retail sales of alcohol and atmospheric carbon dioxide levels between 2000 and 2018 was found to be 0.824. Clearly, there is no logical reason for this. Spurious correlation can also be a result of individual variables’ relation to a third variable. For example, a person’s height may be correlated with the extent of their vocabulary, which does not make sense. However, the underlying relationship between height and age, and age and vocabulary, may provide an explanation for the correlation.
- Correlation does not imply causation: Just because two variables are correlated does not mean that one variable causes the other. For example, there may be a correlation between height and vocabulary, but that does not necessarily mean that having a wider vocabulary is caused by a person being taller. This distinction is important. You would not want an employer to be misled into thinking that it is better to hire taller people because they may have a wider vocabulary!
Being aware of these limitations will help you make better interpretations of correlation coefficients and avoid drawing incorrect conclusions from your analysis.
And that concludes our quick lesson on covariance and correlation, as well as our topic on organizing, visualizing, and describing data. Take a break, and we’ll see you at the next topic!
✨ Free Premium Animation Sample! ✨
Experience visual learning magic with our stunning animation video—FREE for a limited time! Uncover additional details and make lessons come alive. 🎬
Unlock vibrant learning now! 🌟