Guideline panels need to process a sizeable amount of information to issue a decision on whether to recommend a health technology or not. Grading of Recommendations Assessment, Development, and Evaluation (GRADE) is being frequently applied in guideline development to facilitate this task, typically for the synthesis of effectiveness research. Questions regarding the accuracy of medical tests are ubiquitous, and they temporally precede questions about therapy. However, literature summarising the experience of applying GRADE approach to accuracy evaluations is not as rich as one for effectiveness evidence. Type of study design (cross-sectional), two-dimensional nature of the performance measures (sensitivity and specificity), propensity towards a higher level of between-study heterogeneity, poor reporting of quality features and uncertainty about how best to assess for publication bias among other features make this task challenging. This article presents solutions adopted to addresses above challenges for judicious estimation of the strength of test accuracy evidence used to inform evidence syntheses for guideline development.