In arterial spin labeling (ASL) a perfusion weighted image is achieved by subtracting a label image from a control image. This perfusion weighted image has an intrinsically low signal to noise ratio and numerous measurements are required to achieve reliable image quality, especially at higher spatial resolutions. To overcome this limitation various denoising approaches have been published using the perfusion weighted image as input for denoising. In this study we propose a new spatio-temporal filtering approach based on total generalized variation (TGV) regularization which exploits the inherent information of control and label pairs simultaneously. In this way, the temporal and spatial similarities of all images are used to jointly denoise the control and label images. To assess the effect of denoising, virtual ground truth data were produced at different SNR levels. Furthermore, high-resolution in-vivo pulsed ASL data sets were acquired and processed. The results show improved image quality, quantitative accuracy and robustness against outliers compared to seven state of the art denoising approaches.