Using mixed-effect random forest models to capture spatial patterns: a case study on urban crime

The increasing access to spatio-temporal datasets, data-driven modelling methods and computational power have transformed the way we do science. Yet most geodata-driven approaches currently disregard the spatial and temporal aspects of the data they are based on. Here we present and evaluate a hybrid machine learning approach that combines statistical mixed effects theory with the power of random forests. This approach, namely mixed-effects random forests or MERF, is used to model monthly crimes in New York City (USA). Our results show that MERF leads to lower prediction errors and to lower spatial autocorrelation in the residuals than a standard random forest model. This shows that there are approaches to mitigate the non-geocomputational nature of machine learning methods.