Simple Linear Regression: An Introduction | CFA Level I Quantitative Methods

Welcome to a new topic on linear regression. In this first lesson, we get introduced to linear regression, which involves two variables, an independent and a dependent variable. We shall learn how to use the linear regression model to define the relationship between the two variables, and how to estimate and apply the parameters of the model.

Understanding Dependent and Independent Variables

Imagine you wish to predict the average number of months of bonus your employer will be giving out this year. One obvious way is to use historical data, and try to determine if there is a relationship between bonus payouts, and the profits the company makes for that year.

So here we have two variables:

Dependent variable (Y) is the variable which you are trying to predict or explain. The dependent variable is also referred to as the explained variable, the endogenous variable, or the predicted variable.
Independent variable (X) is the variable you are using to explain the variation in the dependent variable.

Visualizing the Relationship and Understanding Simple Linear Regression

To help us visualize the relationship between the two variables, we can plot the past data of bonus against the company profits. Here, we can visually observe that there appears to be a relationship between the two variables. If this relationship is linear and there is just one independent variable, we can term this as a simple linear regression, which is described by this regression model:

Y = B0 + B1 * X + ε

Where:

B0 is the intercept.
B1 is the slope coefficient.
ε is the error term.

Based on this regression model, the regression process estimates an equation for a line that best fits the observed values for Y in terms of the observed values for X. The criteria for estimating this line is that the sum of the squared errors between the predicted Y-values and actual Y-values is minimized. This is termed the sum of squared errors (SSE). We will learn how to calculate the SSE in the next lesson. In mathematics, this form of simple linear regression is often referred to as the least squares method.

Estimating the Slope Coefficient (B1) and Intercept (B0)

To estimate this line, we need to estimate the two parameters of this line: the slope coefficient B1, and the intercept B0.

Formula to calculate the slope coefficient (B1):

B1 = Covariance(X, Y) / Variance(X)

Formula to calculate the intercept (B0):

B0 = Mean(Y) – B1 * Mean(X)

EXAMPLE

So what’s the significance of this number? The slope coefficient describes the change in Y for one unit change in X, so in this example, the estimate is that for every $1 million extra earned by your company, your bonus will increase by 0.3 months.

Again, based on past data you compute the mean number of months of bonus is 2.4 months, and the mean company profit is $4.2 million, you get 1.14 as the estimate of the intercept B0.

The intercept is an estimate of the dependent variable when the independent variable is zero. So in this example, it means that if the company makes zero profit for the year, you can still expect 1.14 months of bonus from the company.

Finding the Line of Best Fit

Now that we have estimated both the parameters, the least squares method gives us the line of best fit as Y = 1.14 + 0.3X. So if the company makes a profit of $8 million for the year, you can expect around 3.54 months of bonus to be declared for the year.

Practice Exercises

EXAMPLE

As a junior analyst, you are tasked to perform a simple linear regression to predict the excess returns of TinyPower shares based on the excess returns of the Tinyland Stock Index.

Which variable is the dependent variable, and which is the independent variable?

Now let’s recall the definitions. The dependent variable is the variable that you are trying to predict or explain, which in this case is the excess returns of TinyPower shares.

The independent variable is the variable that you are using to explain the variation in the dependent, which in this case is the excess returns of Tinyland stock.

Next, given the following statistics, estimate the regression line that minimizes SSE.

Mean (%)   Std Dev (%)   Cov(X,Y)
2.21       6.17          74.37
3.01       12.47

To find the line that minimizes SSE, we need to apply the least squares method that we have learnt earlier, that is to estimate the slope coefficient B1, and the intercept B0. Applying the formulas,

B1 = Cov_XY / σ²_X = 74.37 / 6.17² = 1.95

B0 = Y̅ – b̂₁X̅ = 3.01 – 1.95×2.21 = -1.3

we estimate that the slope coefficient is 1.95, and the intercept is -1.3.

The line of best fit for the data is therefore Y = -1.3 + 1.95X.

And lastly, given that the excess return of TSI for next year is expected to be 4.5%, what is the forecast for TinyPower shares excess return for next year using this regression model?

You should not have much of a problem with this. To get the forecast, simply plug in the figure for X into the regression model, and you get a forecasted excess return of 7.48% for TinyPower shares at 7.48%.

Conclusion

e’ve introduced simple linear regression, explained dependent and independent variables, and learned how to estimate the parameters of the model. We’ve also practiced finding the line of best fit and making predictions using our regression model.

✨ Visual Learning Unleashed! ✨ [Premium]

Elevate your learning with our captivating animation video—exclusive to Premium members! Watch this lesson in much more detail with vivid visuals that enhance understanding and make lessons truly come alive. 🎬

Unlock the power of visual learning—upgrade to Premium and click the link NOW! 🌟

Watch the animated lesson 🎦