Regression Evaluation Metrics in Machine Learning: How to Choose and Smartly Combine Them

Diagram comparing evaluation metrics for supervised and unsupervised machine learning models, such as MSE, MAE, Accuracy, Precision, Recall, F1-Score, Silhouette Score, and Explained Variance.

Regression in machine learning is used to predict continuous values like house prices, tomorrow’s temperature, or someone’s income. But once you’ve built your model, how do you know how accurate it is? For example, if your model predicts a house price that’s 100 million lower than the actual price, how significant is this error? This is where evaluation metrics come into play! These metrics act like a thermometer, showing where your model shines and where it falls short.

In this article, we’ll break down the main regression metrics in simple, relatable terms. We’ll cover simplified formulas, real-world examples, and when to use (or avoid) each metric. You’ll also learn how to combine metrics for a clearer picture of your model’s performance.

Regression Metrics: Which One Should You Use?

Each metric measures a specific aspect of your model’s error. The right choice depends on your data type and project goals. For instance, if your dataset has extreme values (outliers), you’ll need a specific metric. Let’s dive in:

1. Mean Squared Error (MSE) – Heavy Penalty for Large Errors

What is it?
MSE calculates the average of the squared errors. Large errors are punished more harshly. For example, a prediction error of 50 million would be squared to 2500 billion (!) in MSE.

Simplified Formula:

$MSE = \frac{(Error_1)^2 + (Error_2)^2 + … + (Error_n)^2}{Number\ of\ Samples}$

Example:

Model Prediction: 190 million → Actual Price: 200 million → Error: 10 million
Model Prediction: 250 million → Actual Price: 300 million → Error: 50 million
Model Prediction: 140 million → Actual Price: 150 million → Error: 10 million

$MSE = \frac{10^2 + 50^2 + 10^2}{3} = \frac{100 + 2500 + 100}{3} = 900$

Pros & Cons:

Pros: Prioritizes large errors → Ideal for critical predictions (e.g., life-saving drug prices).
Cons: Sensitive to outliers → Misleading if your data has extreme values.
Cons: Unit is squared (e.g., “Toman²”) → Hard to interpret.

Comparison with MAE:

Use MAE if your data has many outliers, as it treats all errors equally.

2. Mean Absolute Error (MAE) – Fairly Measures All Errors

What is it?
MAE calculates the average of absolute errors without squaring them. It treats small and large errors equally.

Simplified Formula:

$MAE = \frac{|Error_1| + |Error_2| + … + |Error_n|}{Number\ of\ Samples}$

Example (Same Data as Above):

$MAE = \frac{10 + 50 + 10}{3} ≈ 23.33$

Pros & Cons:

Pros: Robust to outliers → Great for noisy data.
Cons: Can’t distinguish between small and large errors → A 50M error is treated the same as a 10M error.
Cons: Less popular in optimization (e.g., Gradient Descent) due to non-differentiable nature.

Comparison with MSE:

Use MSE if large errors are critical (e.g., disaster prediction).
Use MAE for datasets with outliers or to avoid exaggerating errors.

3. Root Mean Squared Error (RMSE) – MSE in Understandable Units

What is it?
RMSE is the square root of MSE. It aligns the error unit with the original data for easier interpretation.

Simplified Formula:

$RMSE = \sqrt{MSE}$

Example (Using MSE Above):

$RMSE = \sqrt{900} = 30$

This means the average prediction error is 30 million.

Pros & Cons:

Pros: Easy to explain → Non-technical stakeholders get it.
Cons: Still sensitive to outliers.

Use Case:

Business reports where clarity matters.
Problems with normally distributed errors.

4. R-Squared (R²) – How Much Better Is Your Model Than a Simple Baseline?

What is it?
R² measures how well your model performs compared to a simple baseline (e.g., predicting the mean). Its value ranges from 0 (worst) to 1 (best).

Simplified Formula:

$R^2 = 1 - \frac{Sum\ of\ Squared\ Errors\ (Your\ Model)}{Sum\ of\ Squared\ Errors\ (Baseline\ Model)}$

Example:

Baseline model error: 2500
Your model error: 500
$R^2 = 1 - \frac{500}{2500} = 0.8$

This means your model explains 80% of the data variance.

Pros & Cons:

Pros: Easy to compare models.
Cons: High R² can be misleading if the model performs poorly on new data.
Cons: Doesn’t indicate bias.

Use Case:

Initial model evaluation to gauge overall performance.

5. Mean Absolute Percentage Error (MAPE) – Error in Percentage Terms

What is it?
MAPE expresses error as a percentage, useful for data with varying scales.

Simplified Formula:

$MAPE = \frac{100\%}{n} \times \sum \left| \frac{Error}{Actual\ Value} \right|$

Example:

Actual Value: 200 million → Prediction: 180 million → Error: 20 million
$MAPE = \left| \frac{20}{200} \right| \times 100\% = 10\%$

Pros & Cons:

Pros: Scale-independent → Compare errors across datasets.
Cons: Fails if actual values are zero (division by zero).
Cons: Unreliable for values close to zero.

Use Case:

Sales forecasting for products with varying prices (e.g., $10 to $10,000).

6. Mean Squared Logarithmic Error (MSLE) – For Wide-Ranging Data

What is it?
MSLE uses logarithms to reduce the impact of large value differences. Ideal for data with exponential growth (e.g., population or sales over decades).

Simplified Formula:

$MSLE = \frac{1}{n} \sum (\log(Actual + 1) - \log(Prediction + 1))^2$

Example:

Actual Value: 1000 → Prediction: 1200
$MSLE ≈ (\log(1001) - \log(1201))^2 ≈ (6.9 - 7.09)^2 ≈ 0.033$

Pros & Cons:

Pros: Reduces emphasis on large absolute errors.
Cons: Harder to interpret due to logarithms.

Use Case:

Predicting startup revenue growth (from $100 to billions).

Combining Metrics: Why One Metric Isn’t Enough

Relying on one metric is like painting with one eye closed! Combining metrics gives you a broader perspective.

Smart Combinations:

MSE + MAE:

High MSE but low MAE? → Your model has huge errors in specific outliers.
Example: Predicting luxury home prices → MSE flags errors in expensive homes, while MAE shows good performance for average homes.

R² + RMSE:

R² shows overall performance; RMSE gives real-world error magnitude.
Example: R²=0.9 is great, but RMSE=50M might be unacceptable for business.

MAPE + MSLE:

Use for datasets with mixed scales (e.g., $10 apps vs. $50M laptops).

Practical Example:

If you build a model to predict building electricity consumption:

Using only MSE might hide poor performance in industrial buildings (high consumption).
Combining MSE (absolute error) with MAPE (% error) gives a complete picture.

Conclusion: How to Choose the Right Metric?

Define Your Goal:

Care about % error? → MAPE.
Need absolute error? → RMSE.

Understand Your Data:

Many outliers? → MAE or MAPE.
Wide value ranges? → MSLE.

Combine Metrics:

Always use 2-3 metrics to avoid blind spots.

Remember, there’s no universal “best” metric. The right one depends on your problem, data, and business needs!

Tags: Evaluation Metrics, Metric Combination, Metric Selection, Prediction Error, Regression