|| Checking for direct PDF access through Ovid
In nonmonotonic decision problems, the magnitude of outcomes can both increase and decrease over time depending on the state of the decision problem. These increases and decreases may occur repeatedly and result in a variety of possible outcome distributions. In many previously investigated sequential decision problems, in contrast, outcomes (or the probabilities of obtaining specific outcomes) change monotonically in 1 direction. To investigate how and to what extent people learn in nonmonotonic decision problems, we developed a new task, the Sequential Investment Task (SIT), in which people sequentially decide whether or not to sell shares at several selling points over the course of virtual days. Across trials, they can learn which selling point yields the highest payoff in a specific market. The results of 2 experiments suggest that a reinforcement-learning model generally describes participants' learning processes best. Learning largely depends on an interaction of the complexity of the stochastic process that generates the outcome distribution (i.e., whether the peak selling point is early or late in the selling period and whether there are single or multiple payoff maxima) and the amount of feedback that is available for learning. Although the risk profile in nonmonotonic decision problems renders exploration relatively safe, a clear gap persisted between the choices of people receiving partial feedback (thus facing an exploration–exploitation trade-off) and those of people receiving full feedback: Only the choices of the latter consistently approximated the peak selling points.