4.2.3 Human Explanations
Most of the existing work in ML interpretability uses solely the researchers’ intuition of what constitutes an appropriated explanation for humans. Human explanations are all about how to use language or diagram to present or explain the model-specific and model-agnostic methods.
Human-friendly explanations:
Contrastiveness: Explanations that present some contrast between the instance to explain and a reference point are preferable
Selectivity: People do not expect explanations that cover the actual and complete list of causes of an event. Instead, they prefer selecting one or two main causes from a variety of possible causes as the explanation
Social: Explanations are part of social interaction between the explainer and the explainer
Focus on the abnormal: In terms of ML interpretability, if one of the input feature values for a prediction was abnormal in any sense (e.g., a rare category) and the feature influenced the prediction outcome, it should be included in the explanation, even if other more frequent feature values have the same influence on the prediction as to the abnormal one
Truthful: Good explanations prove to be true in reality. The explanation should predict the event as truthfully as possible, which is machine learning is sometimes called fidelity. With respect to ML interpretability, this means that an explanation must make sense (plausible) and be suitable for predictions of other instances.
Consistent with prior beliefs of the explainer: People have a tendency to ignore information that is inconsistent with their prior beliefs. However, someone argues that it would be counterproductive for an explanation to be simultaneously truthful and consistent with prior beliefs.
General and probable: A cause that can explain a good number of events is very general and could, thus, be considered a good explanation.
Last updated
Was this helpful?