Despite the benefits introduced by robotic systems in abdominal Minimally Invasive Surgery (MIS), major complications can still affect the outcome of the procedure, such as intra-operative bleeding. One of the causes is attributed to accidental damages to arteries or veins by the surgical tools, and some of the possible risk factors are related to the lack of sub-surface visibilty. Assistive tools guiding the surgical gestures to prevent these kind of injuries would represent a relevant step towards safer clinical procedures. However, it is still challenging to develop computer vision systems able to fulfill the main requirements: (i) long term robustness, (ii) adaptation to environment/object variation and (iii) real time processing.
The purpose of this paper is to develop computer vision algorithms to robustly track soft tissue areas (Safety Area, SA), defined intra-operatively by the surgeon based on the real-time endoscopic images, or registered from a pre-operative surgical plan. We propose a framework to combine an optical flow algorithm with a tracking-by-detection approach in order to be robust against failures caused by: (i) partial occlusion, (ii) total occlusion, (iii) SA out of the field of view, (iv) deformation, (v) illumination changes, (vi) abrupt camera motion, (vii), blur and (viii) smoke. A Bayesian inference-based approach is used to detect the failure of the tracker, based on online context information. A Model Update Strategy (MUpS) is also proposed to improve the SA re-detection after failures, taking into account the changes of appearance of the SA model due to contact with instruments or image noise. The performance of the algorithm was assessed on two datasets, representing ex-vivo organs and in-vivo surgical scenarios. Results show that the proposed framework, enhanced with MUpS, is capable of maintain high tracking performance for extended periods of time ( ≃ 4 min - containing the aforementioned events) with high precision (0.7) and recall (0.8) values, and with a recovery time after a failure between 1 and 8 frames in the worst case.