Management of program data to improve data locality and reduce false sharing is critical for scaling performance on NUMA shared memory multiprocessors. We use HPF-like data decomposition directives to partition and place arrays in data-parallel applications on Hector, a shared-memory NUMA multiprocessor. We describe a compiler system for automating the partitioning and placement of arrays. The compiler exploits Hectors shared memory architecture to efficiently implement distributed arrays. Experimental results from a prototype implementation demonstrate the effectiveness of these techniques. They also demonstrate the magnitude of the performance improvement attainable when our compiler-based data management schemes are used instead of operating system data management policies; performance improves by up to a factor of 5.