Social Media Data for Public Health Surveillance
Originally written for CS 410: Text Information Systems at UIUC in Fall 2019
Generally, the aim of public health is to prevent disease and promote health in populations.¹ Public health surveillance is the ongoing, systematic collection, analysis, and interpretation of health-related data essential to planning, implementation, and evaluation of public health practice.² Social media presents a valuable platform to disseminate health education information and a valuable data source to collect health information.³ Indeed, many public health agencies are utilizing social media to educate the public on disease risks and healthy lifestyle choices. The CDC has even released a social media toolkit to guide federal, state, and local agencies on effective health communication via social media.⁴ With more adoption by the general population and public health agencies, social media is proving to be a valuable data source to support traditional public health practice activities. The aim of this technology review is to examine use of social media for public health syndromic surveillance.
New tools are being developed to harness unstructured text data sources for public health surveillance. One such data source is social media text data that researchers have proposed be used to support disease outbreak and disease incidence detection via self-reported symptoms.³ Such data sources provide an opportunity for public health practitioners to forecast disease trends with advanced forecasting methods. However, selection bias can limit the generalizability of social media data since social media users may differ from non-users and social media privacy settings may restrict additional, more representative data.³ Therefore, generally, the literature suggests use of social media data in public health surveillance in conjunction with traditional syndromic surveillance data, such as emergency department visit data, not as a replacement.³
Several research studies have utilized text mining techniques such as sentiment analysis, text classification, and user classification on Twitter health data as proof of concept for public health surveillance.⁵ ⁶ A research review article last year found that public health social media surveillance applications in the literature so far have included disease monitoring, public reaction analysis to health-related issues, outbreak/emergency situation early detection, disease prediction, public lifestyle classifications, geolocation to track disease/outbreak occurrence, and general applications to improve detection of health related information.⁶ Interestingly, researchers have used Twitter data to even estimate life outlook and risk of depression or seasonal affective disorder, views toward healthy behaviors, and adverse events from drug treatments, all based on Tweets.⁶ The review article suggested the need for a greater focus on users as the unit of measure instead of tweets when predicting disease cases, since many tweets are posted by a few extremely active users.⁶ Several models, methods, and algorithms have been applied to social media datasets for public health surveillance including Ailment Topic Aspect Model (ATAM), Latent Dirichlet allocation (LDA), Linguistic Inquiry Word Count (LIWC), and Naive Bayes classifiers, Support Vector Machines (SVM), and K-Nearest Neighbors (KNN).⁶ The research studies examined in the referenced review article have demonstrated proof of concept related to use of Twitter for quantification of health signals and generally for public health surveillance.⁶
Specifically, studies have focused on many different diseases such as influenza, foodborne illnesses, and dengue; the monitoring of adverse reactions to medication; health behaviors such as smoking and diet; and even malpractice.⁶ ⁷ ⁸ In recent years, there has been more focus in social media-based health research on mental health and substance abuse.⁹ ¹⁰ Several models have been developed showcasing the potential of real time syndromic surveillance based on social media data from Twitter.⁵ ⁶ ⁹ ¹⁰ Interestingly, the literature does not appear to show many implementations at a local health department level in major US cities, indicating a need for more robust implementations with validation that can be effectively adopted in public health practice. The New York City health department successfully utilized twitter text mining in 2016 to enhance foodborne illness complaint and outbreak detection.¹¹ Several platforms have recently been developed that may attract public health professional to utilize Twitter for public health surveillance yet such platforms have not yet received broad adoption.¹² ¹³ ¹⁴ Partnerships between computer science researchers and public health practice partners are needed to validate methods and support adoption of the social media surveillance tools. Indeed, there appear to be few collaborations with public health departments in the sub-field thus far. This may be due to a lack of trust or confidence in social media surveillance due to failures by previous systems like Google Flu trends, as other researchers have indicated.⁸ Additionally, public health practitioners may not have sufficient funding to effectively implement such technologies or may not recognize the value of the technologies being developed.
Therefore, this may present an opportunity for researchers to inform and collaborate with public health practitioners, validate findings, and allocate funding. More open source toolkits and platforms designed for ease of use with free access are also needed to enable adoption among public health practitioners in low funded jurisdictions.⁶ ⁹ ¹⁰ Syndromic surveillance using social media data appears to have the potential to support and improve traditional disease surveillance in public health practice, as indicated by the literature so far. There are many unique use cases that such surveillance systems can support in public health practice. These include specialized health communication, health education, outreach, and treatment. With more collaboration, public health agencies may be able to more effectively support their mission to prevent disease and promote health and researchers may be able to validate and translate their findings into practice by utilizing social media data.
References
1. Introduction to Public Health|Public Health 101 Series|CDC. https://www.cdc.gov/publichealth101/public-health.html (2018).
2. Introduction to Public Health Surveillance|Public Health 101 Series|CDC. https://www.cdc.gov/publichealth101/surveillance.html (2018).
3. Fung, I. C.-H., Tse, Z. T. H. & Fu, K.-W. The use of social media in public health surveillance. Western Pacific Surveillance and Response 6, (2015).
4. CDC Social Media Tools, Guidelines & Best Practices | Social Media | CDC. https://www.cdc.gov/socialmedia/tools/guidelines/index.html (2018).
5. Yang, Y. T., Horneffer, M. & DiLisio, N. Mining Social Media and Web Searches For Disease Detection. J Public Health Res 2, 17–21 (2013).
6. Jordan, S. E. et al. Using Twitter for Public Health Surveillance from Monitoring and Prediction to Public Response. Data 4, 6 (2019).
7. Kazemi, D. M., Borsari, B., Levine, M. J. & Dooley, B. Systematic review of surveillance by social media platforms for illicit drug use. J Public Health (Oxf) 39, 763–776 (2017).
8. Paul, M. J. et al. Social media mining for public health monitoring and surveillance. in Pacific Symposium on Biocomputing 2016, PSB 2016 468–479 (World Scientific Publishing Co. Pte Ltd, 2016).
9. Mackey, T., Kalyanam, J., Klugman, J., Kuzmenko, E. & Gupta, R. Solution to Detect, Classify, and Report Illicit Online Marketing and Sales of Controlled Substances via Twitter: Using Machine Learning and Web Forensics to Combat Digital Opioid Access. J Med Internet Res 20, (2018).
10. Conway, M., Hu, M. & Chapman, W. W. Recent Advances in Using Natural Language Processing to Address Public Health Research Questions Using Social Media and ConsumerGenerated Data. Yearb Med Inform 28, 208–217 (2019).
11. Devinney, K. et al. Evaluating Twitter for Foodborne Illness Outbreak Detection in New York City. Online J Public Health Inform 10, (2018).
12. Dredze, M., Cheng, R., Paul, M. J. & Broniatowski, D. HealthTweets.org: A Platform for Public Health Surveillance using Twitter.
13. Rodríguez-Martínez, M. & Garzón-Alfonso, C. C. Twitter Health Surveillance (THS) System. Proc IEEE Int Conf Big Data 2018, 1647–1654 (2018).
14. Șerban, O., Thapen, N., Maginnis, B., Hankin, C. & Foot, V. Real-time processing of social media with SENTINEL: A syndromic surveillance system incorporating deep learning for health classification. Information Processing & Management 56, 1166–1184 (2019).