When Shapley Values Break Down: A Guide to Robust Model Interpretation

nimda January 15, 2026

0 14 7 minutes read

When Shapley Values Break Down: A Guide to Robust Model Interpretation

Interpretation in AI is essential for gaining confidence in model predictions and is critical for improving model robustness. Fine-tuning often serves as a debugging tool, revealing errors in the model training process. Although Shapley Values have become the industry standard for this profession, we must ask: Do they always work? And seriously, when they fail?

To understand where Shapley values fail, the best way is to control the ground truth. We'll start with a simple linear model, then systematically break down the definition. By looking at how Shapley values react to these controlled variables, we can pinpoint exactly where they are producing misleading results and how to correct them.

Toy Model

We will start with a model with 100 identical random variables.

import numpy as np
from sklearn.linear_model import LinearRegression
import shap

def get_shapley_values_linear_independent_variables(
    weights: np.ndarray, data: np.ndarray
) -> np.ndarray:
    return weights * data

# Top compare the theoretical results with shap package
def get_shap(weights: np.ndarray, data: np.ndarray):
    model = LinearRegression()
    model.coef_ = weights  # Inject your weights
    model.intercept_ = 0
    background = np.zeros((1, weights.shape[0]))
    explainer = shap.LinearExplainer(model, background) # Assumes independent between all features
    results = explainer.shap_values(data) 
    return results

DIM_SPACE = 100

np.random.seed(42)
# Generate random weights and data
weights = np.random.rand(DIM_SPACE)
data = np.random.rand(1, DIM_SPACE)

# Set specific values to test our intuition
# Feature 0: High weight (10), Feature 1: Zero weight
weights[0] = 10
weights[1] = 0
# Set maximal value for the first two features
data[0, 0:2] = 1

shap_res = get_shapley_values_linear_independent_variables(weights, data)
shap_res_pacakge = get_shap(weights, data)
idx_max = shap_res.argmax()
idx_min = shap_res.argmin()

print(
    f"Expected: idx_max 0, idx_min 1nActual: idx_max {idx_max},  idx_min: {idx_min}"
)

print(abs(shap_res_pacakge - shap_res).max()) # No difference

In this specific example, where all the variables are independent, the calculation becomes very simple.

Remember that Shapley's formula is based on a small donation for each factor, the difference in model output when the variable is added to the association of known factors compared to when it is not.

[ V(S∪{i}) – V(S)
]

Since the variables are independent, a particular combination of preselected characteristics (S) has no effect on the contribution of characteristic i. The effect of preselected and unselected features cancels out during extraction, which has no effect on the influence of feature i. Thus, the calculation reduces to estimating the marginal effect of factor i directly on the output of the model:

[ W_i · X_i ]

The result is both accurate and works as expected. Because there is no interference with other factors, the contribution depends only on the weight of the factor and its current value. As a result, the feature with the greatest combination of weight and value is the most contributing feature. In our case, feature index 0 has a weight of 10 and a value of 1.

Let's Break Things Down

Now, we will introduce a dependency to see when the Shapley values start to fail.

In this case, we will insert the perfect correlation by multiplying the strongest factor (index 0) 100 times. This results in a new model with 200 features, where 100 features are identical copies of our original top contributor and are independent of all other 99 features. To complete the setup, we assign zero weight to all of these duplicate features. This ensures that the model's predictions remain unchanged. We only change the structure of the input data, not the output. Although this setup seems extreme, it reflects a common real-world situation: taking a known important signal and creating multiple derivative features (such as moving averages, lags, or statistical transformations) to better capture its information.

However, because the original Element 0 and its new copies are completely dependent, the Shapley calculation changes.

Based on- Symmetry Axiom: if two factors contribute equally to the model (in this case, by handling the same information), they should receive equal credit.

Theoretically, knowing the value of any one clone reveals the full knowledge of the group. As a result, the large contribution we previously saw for one element is now evenly distributed across all of it and its 100 clones. “Signal” is received it has been resolvedwhich makes the main driver of the model seem less important than it really is.
Here is the corresponding code:

import numpy as np
from sklearn.linear_model import LinearRegression
import shap

def get_shapley_values_linear_correlated(
    weights: np.ndarray, data: np.ndarray
) -> np.ndarray:
    res = weights * data
    duplicated_indices = np.array(
        [0] + list(range(data.shape[1] - DUPLICATE_FACTOR, data.shape[1]))
    )
    # we will sum those contributions and split contribution among them
    full_contrib = np.sum(res[:, duplicated_indices], axis=1)
    duplicate_feature_factor = np.ones(data.shape[1])
    duplicate_feature_factor[duplicated_indices] = 1 / (DUPLICATE_FACTOR + 1)
    full_contrib = np.tile(full_contrib, (DUPLICATE_FACTOR+1, 1)).T
    res[:, duplicated_indices] = full_contrib
    res *= duplicate_feature_factor
    return res

def get_shap(weights: np.ndarray, data: np.ndarray):
    model = LinearRegression()
    model.coef_ = weights  # Inject your weights
    model.intercept_ = 0
    explainer = shap.LinearExplainer(model, data, feature_perturbation="correlation_dependent")    
    results = explainer.shap_values(data)
    return results

DIM_SPACE = 100
DUPLICATE_FACTOR = 100

np.random.seed(42)
weights = np.random.rand(DIM_SPACE)
weights[0] = 10
weights[1] = 0
data = np.random.rand(10000, DIM_SPACE)
data[0, 0:2] = 1

# Duplicate copy of feature 0, 100 times:
dup_data = np.tile(data[:, 0], (DUPLICATE_FACTOR, 1)).T
data = np.concatenate((data, dup_data), axis=1)
# We will put zero weight for all those added features:
weights = np.concatenate((weights, np.tile(0, (DUPLICATE_FACTOR))))


shap_res = get_shapley_values_linear_correlated(weights, data)

shap_res = shap_res[0, :] # Take First record to test results
idx_max = shap_res.argmax()
idx_min = shap_res.argmin()

print(f"Expected: idx_max 0, idx_min 1nActual: idx_max {idx_max},  idx_min: {idx_min}")

This is clearly not what we intended and fails to provide a good explanation for modeling the behavior. Ideally, we want the definition to reflect a basic fact: Feature 0 is the primary driver (with a weight of 10), while duplicate features (indexes 101–200) are simply redundant copies with a weight of zero. Instead of diluting the signal in every copy, we would clearly prefer an attribute that highlights the original source of the signal.

Note: If you run this using the Python shap package, you may see that the results are similar but not the same as our manual calculations. This is because calculating Shapley values is not possible with a computer. So libraries like shap rely on measurement methods that introduce less variation.

Author's photo (produced via Google Gemini).

Can We Fix This?

Since correlations and dependencies between factors are very common, we cannot ignore this problem.

On the other hand, Shapley values account for this dependence. A factor that has a coefficient of 0 in the linear model and has no direct effect on the output receives a non-zero contribution because it contains information shared with other factors. However, this behavior, driven by the Symmetry Axiom, is not always what we want in a functional definition. While “fairly” apportioning credit between correlated factors makes statistical sense, it often obscures the real drivers of the model.

Several strategies can address this, and we will explore them.

Integration features

This approach is particularly important for high-dimensional feature space models, where feature correlation is unavoidable. In these settings, trying to assign a specific contribution to all variables is often noisy and statistically unstable. Instead, we can group similar elements that represent the same concept into one group. A useful analogy comes from image segmentation: if we want to explain why the model predicts “cat” instead of “dog”, examining individual pixels makes no sense. However, if we group the pixels into “patches” (eg, ears, tail), the meaning is quickly translated. By applying this same concept to tabular data, we can calculate the contribution of a group rather than dividing it arbitrarily among its components.

This can be accomplished in two ways: by simply summing the Shapley values within each group or by direct calculation group contribution. In a straightforward way, we treat the group as a single entity. Instead of changing individual features, we treat the presence and absence of a group as the simultaneous presence or absence of all features within it. This reduces the size of the problem, making the measurement faster, more accurate, and more stable.

Author's photo (produced via Google Gemini).

Winner Takes All

Although integration is successful, it has limitations. It requires defining groups in advance and often ignores the relationships between those groups.

This leads to “loss of meaning”. Returning to our example, if the combined 101 elements are not pre-grouped, the output will repeat those 101 elements with the same contribution 101 times. This is overwhelming, repetitive, and useless. A functional description should minimize repetition and present a new element to the user each time.

To achieve this, we can create a greedy iterative process. Instead of calculating all the values at once, we can select the factors step by step:

Select “Winner”: Identify one factor (or group) with the highest individual contribution
Determine the Next Step: Re-evaluate the remaining features, assuming that the features from the previous step are already known. We will combine them into a subset of pre-selected features S at the shapley value each time.
Repeat: Ask the model: “Given that the user already knows about Feature A, B, C, which remaining feature provides the most information?”

By recalculating the Shapley values (or marginal contributions) placed on the preselected features, we ensure that the redundant features drop to zero. If Feature A and Feature B are the same and Feature A is selected first, Feature B no longer provides new information. It is filtered automatically, leaving a clean, short list of unique drivers.

Be careful: You can find an implementation of this exact group and greedy iterative computation in our Python package medpython.
Full disclosure: I am one of the authors of this open source package.

Real World Verification

Although this toy model shows the mathematical errors in the Shapley values method, how does it work in real situations?

We used those methods to Collected Shapley with The winner takes all, in addition to additional methods (out of the scope of this post, maybe in the future), in complex clinical settings used in health care. Our models use hundreds of highly correlated features grouped into multiple concepts.

This method was validated across several models in a blinded environment where our clinicians did not know which method they were testing, and it outperformed vanilla Shapley values in their standard. Each method contributed more than the previous test to the multi-step test. Additionally, our team used these descriptive enhancements as part of our deployment to The CMS Health AI Challengewhere we were selected as award winners.

Photo by **Centers for Medicare & Medicaid Services (CMS)**

The conclusion

Shapley values are the gold standard for model interpretability, providing a robust mathematical approach to credit estimation.
However, as we have seen, mathematical “correctness” does not always translate into effective interpretation.

If the features are highly correlated, the signal may be reduced, hiding the true drivers of your model behind a wall of redundancy.

We tested two ways to fix this:

Integration: Combine elements into one concept
Repetitive Choice: to stop at already introduced concepts to extract only new information, effectively eliminating reuse.

By acknowledging these limitations, we can ensure that our explanations are meaningful and useful.

If you found this helpful, connect us on LinkedIn

Source link

nimda January 15, 2026

0 14 7 minutes read