There is growing interest in conducting public health research using data from social media. In particular, Twitter “infoveillance” has demonstrated utility across health contexts. However, rigorous and reproducible methodologies for using Twitter data in public health are not yet well articulated, particularly those related to content analysis, which is a highly popular approach.
In 2014, we gathered an interdisciplinary team of health science researchers, computer scientists, and methodologists to begin implementing an open-source framework for real-time infoveillance of Twitter health messages (RITHM). Through this process, we documented common challenges and novel solutions to inform future work in real-time Twitter data collection and subsequent human coding.
The RITHM framework allows researchers and practitioners to use well-planned and reproducible processes in retrieving, storing, filtering, subsampling, and formatting data for health topics of interest. Further considerations for human coding of Twitter data include coder selection and training, data representation, codebook development and refinement, and monitoring coding accuracy and productivity. We illustrate methodological considerations through practical examples from formative work related to hookah tobacco smoking, and we reference essential methods literature related to understanding and using Twitter data.