This paper describes several massively parallel implementations for a global search algorithm DIRECT. Two parallel schemes take different approaches to address DIRECT's design challenges imposed by memory requirements and data dependency. Three design aspects in topology, data structures, and task allocation are compared in detail. The goal is to analytically investigate the strengths and weaknesses of these parallel schemes, identify several key sources of inefficiency, and experimentally evaluate a number of improvements in the latest parallel DIRECT implementation. The performance studies demonstrate improved data structure efficiency and load balancing on a 2200 processor cluster.