This paper aims to address two of the key research issues in computer vision — the detection and tracking of multiple objects in the cluttered dynamic scene — that underpin the intelligence aspects of advanced visual surveillance systems aiming at automated visual events detection and behaviour analysis. We discuss two major contributions in resolving these problems within a systematic framework. Firstly, for accurate object detection, an efficient and effective scheme is proposed to remove cast shadows/highlights with error corrections based on a conditional morphological reconstruction. Secondly, for effective tracking, a temporal-template-based tracking scheme is introduced, using multiple descriptive cues (velocity, shape, colour, etc) of the 2-D object appearance together with their respective variances over time. A scaled Euclidean distance is used as the matching metric, and the template is updated using Kalman filters when a matching is found or by linear mean prediction in the case of occlusion. Extensive experiments are carried out on video sequences from various real-world scenarios. The results show very promising tracking performance.