The ability to select between actions that are more vs. less likely to be reinforced is necessary for survival and navigation of a changing environment. A task termed “response-outcome contingency degradation” can be used in the laboratory to determine whether rodents behave according to such goal-directed response strategies. In one iteration of this task, rodents are trained to perform two food-reinforced behaviors, then the predictive relationship between one instrumental response and the associated outcome is modified by providing the reinforcer associated with that response non-contingently. During a subsequent probe test, animals can select between the two trained responses. Preferential engagement of the behavior most likely to be reinforced is considered goal-directed, while non-selective responding is considered a failure in response-outcome conditioning, or “habitual.” This test has largely been used with rats, and less so with mice. Here we compiled data collected from several cohorts of mice tested in our lab between 2012 and 2015. Mice were bred on either a C57BL/6 or predominantly BALB/c strain background. We report that both strains of mice can use information acquired as a result of instrumental contingency degradation training to select amongst multiple response options the response most likely to be reinforced. Mice differ, however, during the training sessions when the familiar response-outcome contingency is being violated. BALB/c mice readily generate perseverative or habit-like response strategies when the only available response is unlikely to be reinforced, while C57BL/6 mice more readily inhibit responding. These findings provide evidence of strain differences in response strategies when an anticipated reinforcer is unlikely to be delivered.