Integration of precision farming data and spatial statistical modelling to interpret field-scale maize grain yield variability in New Zealand

Spatial variability in soil, crop, and topographic features, combined with temporal variability in weather can result in variable annual yield patterns within a paddock. The complexity of interactions between these yield-limiting factors requires specialist statistical processing to be able to quantify spatial and temporal variability, and thus inform crop management practices.
This paper evaluates the role of multivariate linear regression and a Cubist regression model to predict spatial variability of maize-grain yield at two sites in the Waikato Region, New Zealand. The variables considered were: crop reflectance data from satellite imagery (Sentinel 2 and Landsat 8), soil electrical conductivity (EC), soil organic matter (OM), elevation, rainfall, temperature, solar radiation, and seeding density. The datasets were split into training and validation sets, proportionally 75% and 25% respectively. Both models learn using 10-fold cross-validation. Statistical performance was evaluated by leaving out one year of yield data as the validation set for each iteration, with all remaining years included in the training set for building the prediction models.
In the multiple-year analysis, the Cubist model (RMSE=1.47 and R2=0.82 for site 1; RMSE=2.13 and R2=0.72 for site 2) produced a better statistical prediction than the MLR model (RMSE=2.41 and R2=0.51 for site 1; RMSE=3.37 and R2=0.30 for site 2) for the prediction of the validation set. However, for the leave-one-year-out analyses, the MLR model provided better statistical predictions (RMSE=1.57 to 4.93; R2 = 0.15 to 0.31) than the Cubist model (RMSE = 2.62 to 5.9; R2 = 0.05 to 0.14) for Site 1. For Site 2, both models produced poor results.
Yield data for additional years and inclusion of more independent variables (e.g. soil fertility and texture) may improve the models. This analysis demonstrates that there is potential to use statistical modelling of spatial and temporal data to assist farm management decisions (e.g. variable rate application, precision land levelling, irrigation, and drainage). Once the functional relationship between within-paddock yield potential and complementary variables is established, it should be possible to provide an accurate management prescription, enabling variable rates of an input (e.g. plant density, fertiliser) to be applied automatically across the paddock based on the “yield-input” response curve.