Revisiting kernel logistic regression under the random utility models perspective. An interpretable machine-learning approach


The success of machine-learning methods is spreading their use to many different fields. This paper analyses one of these methods, the Kernel Logistic Regression (KLR), from the point of view of Random Utility Model (RUM) and proposes the use of the KLR to specify the utilities in RUM, freeing the modeler from the need to postulate a functional relation between the features. A Monte Carlo simulation study is conducted to empirically compare KLR with the Multinomial Logit (MNL) method, the Support Vector Machine (SVM) and the Random Forests (RF). We have shown that, using simulated data, KLR is the only method that achieves maximum accuracy and leads to an unbiased willingness-to-pay estimator for non-linear phenomena. In a real travel mode choice problem, RF achieved the highest predictive accuracy, followed by KLR. However, KLR allows for the calculation of indicators such as the value of time, which is of great importance in the context of transportation.

Transportation Letters