6. Discussion

6.1 Advantage

The Evaluation System

The topic of interpretable machine learning is becoming increasingly ubiquitous recently. The following question now becomes how to evaluate the model comprehensively and fairly. Therefore, we build up this evaluation system structure to help evaluate the interpretable machine learning model. This structure is based on a well-accepted interpretability evaluation taxonomy, which consists of application-grounded, human-grounded, and functionally grounded three evaluation metrics.

There are also three metrics in our evaluation system, which are model-specific methods evaluation, model-agnostic evaluation, and human-friendly explanation evaluation. Under each criterion, there are lists of keywords, which can be viewed as evaluation features from different perspectives.

Then we scored those features under some model-specific methods, eg: GLM, Decision Tree, GBM, XGBoost, etc as well as the model-agnostic methods, eg: Variable Importance, PDP, ICE. This temporary score table will be crucial for the next step, which is customized to the interpretable model with user-specific focus.

Recommendation System

Since we come up with a relatively comprehensive evaluation system, we can take full usage of this system, not only evaluate the interpretable models afterward but also beforehand. Moreover, as we mentioned before, the evaluation of the interpretable machine learning model is subjective and hard to cover everything.

Therefore, we adding the concept of recommendation system. That is, give the user the flexibility to choose their preferred features from the keywords in our evaluation system. And based on the temporary score table, we can generate lists of model-specific and model-agnostic methods. These lists will help the user to create their interpretable model, and by using the methods we recommend, the evaluation feature will be fulfilled accordingly at the same time.

6.2 Disadvantage

The Evaluation System is the core of the evaluation of interpretable machine learning. And the Score Table is the fundamental of the recommendation system. Therefore, the comprehensiveness of the evaluation system, as well as the correctness of the score in the score table are very important.

Rigorous scoring schemes need to be provided to support the recommendation system. And we also need strong evidence to prove that the score in the score table is valid and reasonable. The evaluation system is convincible only when it fulfills the conditions above.

6.3 Future

The Score Table

As we mentioned in Chapter 6.2 Disadvantage, we need to prove that the score in the score table is valid and reasonable. Therefore, besides a rigorous scoring system, we will do more study on the content of the score table. Including the algorithms, eg: GLM, Decision Tree, Deep Learning, etc and methods like PDP, ALE, Surrogate Model, etc as well as the features in the table.

For example, how to define Algorithmic complexity? If we use the Big O notation to describe that, a strict scoring system also needs to be provided to support that. We need a mathematical description to support our score.

There is still a lot of research and study that needs to be done to build this rigorous score table.

The Recommendation System

So far, the recommendation system we have is still a somehow manual system. Our next step is to change it into an automatic system. A system with a user-friendly interface which allows user to select their preferred features and return the lists of suggested algorithms and methods automatically. Ideally, the radar chart can be also presented automatically.

Previous5.3 Results & Explanation Next7. Citation and License

Last updated 4 years ago

Was this helpful?