Spatiotemporal interactions affect visual performance under repeated stimulation conditions, showing both incremental (commonly related to learning) and decremental (possibly sensory adaptation) effects. Here we examined the role of spatiotemporal consistencies on learning dynamics and transfer. The backward-masked texture-discrimination paradigm was used, with stimulus onset asynchrony (SOA) controlling the observers’ performance level. Temporal consistencies were examined by modifying the order in which SOA was varied during a training session: gradually reduced SOA (high consistencies) versus randomized SOA (low consistencies). Spatial consistencies were reduced by interleaving standard target trials with oriented ‘dummy’ trials containing only the background texture (no target, oriented 45° relative to the target’s orientation). Our results showed reduced improvement following training with gradual SOA, as compared with random SOA. However, this difference was eliminated by randomizing SOA only at the initial and final segments of training, revealing a contaminating effect of temporal consistencies on threshold estimation rather than on learning. Inserting the ‘dummy’ trials (reduced spatial consistencies) facilitated both the learning and the subsequent transfer of learning, but only when sufficient pre-training was provided. These results indicate that visual sensitivity depends on a balance between two opposing processes, perceptual learning and sensory adaptation, both of which depend on spatiotemporal consistencies. Reducing spatiotemporal consistencies during training reduces the short-term spatiotemporal interactions that interfere with threshold estimation, learning, and generalization of learning. We consider the results within a theoretical framework, assuming an adaptable low-level network and a readout mechanism, with orientation and location-specific low-level adaptation interfering with the readout learning.