R-Squared Error Metric

Raisul Hazari
2 min readSep 15, 2020

In general we can divide the ML problems in two broad categories, one is Classification problem where we need to predict the binary values or in terms of Yes or No , 0 or 1 and another one is Regression Problem where the model predicts the real values.

R-Squared is an error metric for our regression problems and the range lies in between (-∞,1)

Before explaining what exactly R-squared error is , we should know,

i) Sum of Square (SS):
let,
yᵢ is the actual value given in the dataset
ŷᵢ is the predicted value by the model
Now, SS can be determined as ∑(yᵢ-ŷᵢ)² where (i = 1,2,3….n). This is also written as SS(residue)

ii) SS(Total):
here , we put the predicted value as mean of actual values. So irrespective of building a model which will predict the outcomes we blindly consider that irrespective of the complexity it will predict the mean value.
For example, if we have the actual values as (1,2,3,4,5) for five records , in all the cases the model will predict the mean values of those 5 records which would be (3,3,3,3,3).
So, SS(Total) can be calculated as ∑(yᵢ-y̅)² where y̅=mean(yᵢ) and
i=(1,2,3….n).

Now , the R-squared error can be determined by
R² = {1- (SS(residue)/SS(Total))}

Let’s discuss about the each cases,

Case-1 : if SS(residue) < SS(Total)
it means , the model predicted better than the mean model.
So, 0 < R² < 1. Typically this is the ideal situation , the closer the R² value to 1 the better the model is.

Case-2: if SS(residue) = SS(Total)
The predicted value is same as the mean model and R² = 0.

Case-3: if SS(residue) > SS(Total)
This is the worst case scenario and the model needs to be improved further as it is worse than the mean model and -∞ < R² < 0. The lesser the R² value, the worse the model is.

Have questions, please do write in the comment section.

--

--