Advances in biophysical multi-compartment modeling for diffusion MRI (dMRI) have gained popularity because of greater specificity than DTI in relating the dMRI signal to underlying cellular microstructure. A large range of these diffusion microstructure models have been developed and each of the popular models comes with its own, often different, optimization algorithm, noise model and initialization strategy to estimate its parameter maps. Since data fit, accuracy and precision is hard to verify, this creates additional challenges to comparability and generalization of results from diffusion microstructure models. In addition, non-linear optimization is computationally expensive leading to very long run times, which can be prohibitive in large group or population studies. In this technical note we investigate the performance of several optimization algorithms and initialization strategies over a few of the most popular diffusion microstructure models, including NODDI and CHARMED. We evaluate whether a single well performing optimization approach exists that could be applied to many models and would equate both run time and fit aspects. All models, algorithms and strategies were implemented on the Graphics Processing Unit (GPU) to remove run time constraints, with which we achieve whole brain dataset fits in seconds to minutes. We then evaluated fit, accuracy, precision and run time for different models of differing complexity against three common optimization algorithms and three parameter initialization strategies. Variability of the achieved quality of fit in actual data was evaluated on ten subjects of each of two population studies with a different acquisition protocol. We find that optimization algorithms and multi-step optimization approaches have a considerable influence on performance and stability over subjects and over acquisition protocols. The gradient-free Powell conjugate-direction algorithm was found to outperform other common algorithms in terms of run time, fit, accuracy and precision. Parameter initialization approaches were found to be relevant especially for more complex models, such as those involving several fiber orientations per voxel. For these, a fitting cascade initializing or fixing parameter values in a later optimization step from simpler models in an earlier optimization step further improved run time, fit, accuracy and precision compared to a single step fit. This establishes and makes available standards by which robust fit and accuracy can be achieved in shorter run times. This is especially relevant for the use of diffusion microstructure modeling in large group or population studies and in combining microstructure parameter maps with tractography results.