How to Conduct Rigorous Supervised Machine Learning in Information Systems Research: The Supervised Machine Learning Reportcard

Niklas Kühl, Robin Hirt, Lucas Baier, Björn Schmitz, Gerhard Satzger.

December 2020. Communications of the Association for Information Systems.

[Link]

Abstract

Within the last decade, the application of supervised machine learning (SML) has become increasingly popular in the field of information systems (IS) research. Although the choices among different data preprocessing techniques, as well as different algorithms and their individual implementations, are fundamental building blocks of SML results, their documentation-and therefore reproducibility-is inconsistent across published IS research papers. This may be quite understandable, since the goals and motivations for SML applications vary and since the field has been rapidly evolving within IS. For the IS research community, however, this poses a big challenge, because even with full access to the data neither a complete evaluation of the SML approaches nor a replication of the research results is possible. Therefore, this article aims to provide the IS community with guidelines for comprehensively and rigorously conducting, as well as documenting, SML research: First, we review the literature concerning steps and SML process frameworks to extract relevant problem characteristics and relevant choices to be made in the application of SML. Second, we integrate these into a comprehensive “Supervised Machine Learning Reportcard (SMLR)” as an artifact to be used in future SML endeavors. Third, we apply this reportcard to a set of 121 relevant articles published in renowned IS outlets between 2010 and 2018 and demonstrate how and where the documentation of current IS research articles can be improved. Thus, this work should contribute to a more complete and rigorous application and documentation of SML approaches, thereby enabling a deeper evaluation and reproducibility / replication of results in IS research.​