Tutorial

Explaining anomalies in energy markets with Machine Learning models

Discover how to explain the influence of explanatory variables in a Machine Learning model that detects anomalies using SHAP values.

The fact that it is a weekend does not influence the detection of anomalies in electricity prices in the same way. Why?

SHAP waterfall chart showing the contribution of each variable to a specific anomaly prediction in electricity pricing during the weekend. — F1. SHAP analysis for specific anomalies

Machine Learning models do not follow linear patterns. We explain it step by step.

Data

For each hour of a day, the price of electricity is determined by the daily and intraday markets.

In this case, we will explain how explanatory variables influence the detection of anomalous observations using SHAP values.

Questions

What technique is used to explain Machine Learning models?
How to interpret SHAP values?
Why does the same value of WEEKEND influence the model’s result differently?
How to visualize the influence of each explanatory variable in anomaly detection across all observations?
And in a specific observation?

Methodology

Model for detecting anomalies

We use the Isolation Forest algorithm to detect anomalies in historical data.

from sklearn.ensemble import IsolationForest

model = IsolationForest(contamination=0.01, random_state=42)
model.fit(df)

Data visualization showing anomalies detected in electricity prices over time, using the Isolation Forest model. — F3. Visualized anomaly detection

The chart is very nice for seeing when anomalies occur. But beyond visualizing them, we need to explain the influence of explanatory variables in their detection.

Visit this tutorial to learn more about the Isolation Forest algorithm.

SHAP values to explain anomalies

We create an explainer for the model and calculate SHAP values to explain anomalies.

import shap

explainer = shap.Explainer(model.predict, df)
shap_values = explainer(df)

After calculating SHAP values, we visualize the importance of explanatory variables in anomaly detection.

SHAP summary

The price of intraday market 2 is the most important variable for detecting anomalies, with an average absolute influence of 0.3.

shap.plots.bar(shap_values)

SHAP bar chart indicating variable importance, with intraday market 2 being the most influential in anomaly detection. — F4. Variable importance by SHAP

In addition to average values, we can visualize the summary chart to understand the influence of explanatory variables on each observation.

shap.summary_plot(shap_values)

SHAP summary showing the impact of explanatory variables on anomaly predictions for all observations, including the WEEKEND effect. — F5. Global impact of variables

What is most significant?

If we look at WEEKEND, low values (i.e., weekdays) cause the prediction to go to the right, predicting normal observations.

Conversely, weekends (high values in red) make it more likely that observations are anomalous.

What additional conclusions can you identify from this analysis? I’ll read you in the comments.

SHAP in detail

Furthermore, we can analyze the specific influence of explanatory variables on each observation.

shap.plots.waterfall(shap_values[0])

As you can see, the influence of the same explanatory variable is not always the same despite having the same value (see WEEKEND).

Remember that Machine Learning models do not follow linear patterns.

Interactive SHAP

Finally, we can visualize the detail of all observations in an interactive chart.

shap.force_plot(explainer.expected_value, shap_values.values, df)

Interactive SHAP chart detailing the influence of variables on anomaly detection for specific observations, showing the model's complexity. — F6. Interactive SHAP details

Conclusions

Technique for Explaining Models: SHAP (SHapley Additive exPlanations) to explain the non-linearity of Machine Learning models.
Interpreting SHAP Values: How much does each explanatory variable influence the model, on average?
Different Influence of WEEKEND=1: When explaining non-linear models, the same value of an explanatory variable can influence the model’s result differently.
Visualizing Influence Across All Observations: shap.summary_plot(shap_values) shows a ranking of the influence of explanatory variables among all observations.
Visualizing in a Specific Observation: shap.waterfall_plot(shap_values[0]) details the influence per variable in a specific observation.

If you could program whatever you wanted, what would it be? I could create a tutorial about it ;)

Let’s talk in the comments below.