Statistical species distribution models (SDMs) are increasingly used to project spatial relocations of marine taxa under future climate change scenarios. However, tests of their predictive skill in the real-world are rare. Here, we use data from the Continuous Plankton Recorder program, one of the longest running and most extensive marine biological monitoring programs, to investigate the reliability of predicted plankton distributions. We apply three commonly used SDMs to 20 representative plankton species, including copepods, diatoms, and dinoflagellates, all found in the North Atlantic and adjacent seas. We fit the models to decadal subsets of the full (1958–2012) dataset, and then use them to predict both forward and backward in time, comparing the model predictions against the corresponding observations. The probability of correctly predicting presence was low, peaking at 0.5 for copepods, and model skill typically did not outperform a null model assuming distributions to be constant in time. The predicted prevalence increasingly differed from the observed prevalence for predictions with more distance in time from their training dataset. More detailed investigations based on four focal species revealed that strong spatial variations in skill exist, with the least skill at the edges of the distributions, where prevalence is lowest. Furthermore, the scores of traditional single-value model performance metrics were contrasting and some implied overoptimistic conclusions about model skill. Plankton may be particularly challenging to model, due to its short life span and the dispersive effects of constant water movements on all spatial scales, however there are few other studies against which to compare these results. We conclude that rigorous model validation, including comparison against null models, is essential to assess the robustness of projections of marine planktonic species under climate change.