An accurate comparison of methods for quantifying variable importance in artificial neural networks using simulated data

Julian D. Olden, Michael K. Joy, Russell G. Death

Ecological Modelling, accepted 15 March 2004

https://doi.org/10.1016/j.ecolmodel.2004.03.013

Abstract

Artificial neural networks (ANNs) are receiving greater attention in the ecological sciences as a powerful statistical modeling technique; however, they have also been labeled a “black box” because they are believed to provide little explanatory insight into the contributions of the independent variables in the prediction process. A recent paper published in Ecological Modelling [Review and comparison of methods to study the contribution of variables in artificial neural network models, Ecol. Model. 160 (2003) 249–264] addressed this concern by providing a comprehensive comparison of eight different methodologies for estimating variable importance in neural networks that are commonly used in ecology. Unfortunately, comparisons of the different methodologies were based on an empirical dataset, which precludes the ability to establish generalizations regarding the true accuracy and precision of the different approaches because the true importance of the variables is unknown. Here, we provide a more appropriate comparison of the different methodologies by using Monte Carlo simulations with data exhibiting defined (and consequently known) numeric relationships. Our results show that a Connection Weight Approach that uses raw input-hidden and hidden-output connection weights in the neural network provides the best methodology for accurately quantifying variable importance and should be favored over the other approaches commonly used in the ecological literature. Average similarity between true and estimated ranked variable importance using this approach was 0.92, whereas, similarity coefficients ranged between 0.28 and 0.74 for the other approaches. Furthermore, the Connection Weight Approach was the only method that consistently identified the correct ranked importance of all predictor variables, whereas, the other methods either only identified the first few important variables in the network or no variables at all. The most notably result was that Garson’s Algorithm was the poorest performing approach, yet is the most commonly used in the ecological literature. In conclusion, this study provides a robust comparison of different methodologies for assessing variable importance in neural networks that can be generalized to other data and from which valid recommendations can be made for future studies.