Sikosana, Mkululi, Ajao, Oluwaseun ORCID: https://orcid.org/0000-0002-6606-6569 and Maudsley-Barton, Sean ORCID: https://orcid.org/0000-0003-0289-0783 (2024) A comparative study of hybrid models in health misinformation text classification. In: OASIS 2024 : 4th International Workshop on Open Challenges in Online Social Networks (OASIS), 9 September 2024 - 13 September 2024, Poznan, Poland.
|
Published Version
Available under License Creative Commons Attribution. Download (200kB) | Preview |
Abstract
This study evaluates the effectiveness of machine learning (ML) and deep learning (DL) models in detecting COVID-19-related misinformation on online social networks (OSNs), aiming to develop more effective tools for countering the spread of health misinformation during the pan-demic. The study trained and tested various ML classifiers (Naive Bayes, SVM, Random Forest, etc.), DL models (CNN, LSTM, hybrid CNN+LSTM), and pretrained language models (DistilBERT, RoBERTa) on the ”COVID19-FNIR DATASET.” These models were evaluated for accuracy, F1 score, recall, precision, and ROC, and used preprocessing techniques like stemming and lemmatization. The results showed SVM performed well, achieving a 94.41% F1-score. DL models with Word2Vec embeddings exceeded 98% in all performance metrics (accuracy, F1 score, recall, precision & ROC). The CNN+LSTM hybrid models also exceeded 98% across performance metrics, outperforming pretrained models like DistilBERT and RoBERTa. Our study concludes that DL and hybrid DL models are more effective than conventional ML algorithms for detecting COVID-19 misinformation on OSNs. The findings highlight the importance of advanced neural network approaches and large-scale pretraining in misinformation detection. Future research should optimize these models for various misinformation types and adapt to changing OSNs, aiding in combating health misinformation.
Impact and Reach
Statistics
Additional statistics for this dataset are available via IRStats2.