Precise delineation of organs at risk is a crucial task in radiotherapy treatment planning for delivering high doses to the tumor while sparing healthy tissues. In recent years, automated segmentation methods have shown an increasingly high performance for the delineation of various anatomical structures. However, this task remains challenging for organs like the esophagus, which have a versatile shape and poor contrast to neighboring tissues. For human experts, segmenting the esophagus from CT images is a time-consuming and error-prone process. To tackle these issues, we propose a random walker approach driven by a 3D fully convolutional neural network (CNN) to automatically segment the esophagus from CT images.Methods:
First, a soft probability map is generated by the CNN. Then, an active contour model (ACM) is fitted to the CNN soft probability map to get a first estimation of the esophagus location. The outputs of the CNN and ACM are then used in conjunction with a probability model based on CT Hounsfield (HU) values to drive the random walker. Training and evaluation were done on 50 CTs from two different datasets, with clinically used peer-reviewed esophagus contours. Results were assessed regarding spatial overlap and shape similarity.Results:
The esophagus contours generated by the proposed algorithm showed a mean Dice coefficient of 0.76 ± 0.11, an average symmetric square distance of 1.36 ± 0.90 mm, and an average Hausdorff distance of 11.68 ± 6.80, compared to the reference contours. These results translate to a very good agreement with reference contours and an increase in accuracy compared to existing methods. Furthermore, when considering the results reported in the literature for the publicly available Synapse dataset, our method outperformed all existing approaches, which suggests that the proposed method represents the current state-of-the-art for automatic esophagus segmentation.Conclusion:
We show that a CNN can yield accurate estimations of esophagus location, and that the results of this model can be refined by a random walk step taking pixel intensities and neighborhood relationships into account. One of the main advantages of our network over previous methods is that it performs 3D convolutions, thus fully exploiting the 3D spatial context and performing an efficient volume-wise prediction. The whole segmentation process is fully automatic and yields esophagus delineations in very good agreement with the gold standard, showing that it can compete with previously published methods.