Linguistic features evaluation for hadith authenticity through automatic machine learning

Mohamed, Emad and Sarwar, Raheem ORCID: https://orcid.org/0000-0002-0640-807X (2022) Linguistic features evaluation for hadith authenticity through automatic machine learning. Digital Scholarship in the Humanities, 37 (3). pp. 830-843. ISSN 2055-7671

Preview

Accepted Version
Available under License In Copyright.
Download (417kB) | Preview

Official URL: http://dx.doi.org/10.1093/llc/fqab092

Abstract

There has not been any research that provides an evaluation of the linguistic features extracted from the matn (text) of a Hadith. Moreover, none of the fairly large corpora are publicly available as a benchmark corpus for Hadith authenticity, and there is a need to build a 'gold standard' corpus for good practices in Hadith authentication. We write a scraper in Python programming language and collect a corpus of 3,651 authentic prophetic traditions and 3,593 fake ones. We process the corpora with morphological segmentation and perform extensive experimental studies using a variety of machine learning algorithms, mainly through automatic machine learning, to distinguish between these two categories. With a feature set including words, morphological segments, characters, top N words, top N segments, function words, and several vocabulary richness features, we analyze the results in terms of both prediction and interpretability to explain which features are more characteristic of each class. Many experiments have produced good results and the highest accuracy (i.e. 78.28%) is achieved using word n-grams as features using the Multinomial Naive Bayes classifier. Our extensive experimental studies conclude that, at least for Digital Humanities, feature engineering may still be desirable due to the high interpretability of the features. The corpus and software (scripts) will be made publicly available to other researchers in an effort to promote progress and replicability.

Item Type:	Article (Article)
Peer-reviewed:	Yes
Date Deposited:	29 Jul 2024 13:02
Publisher:	Oxford University Press (OUP)
Additional Information:	This is a pre-copyedited, author-produced version of an article accepted for publication in Digital Scholarship in the Humanities following peer review. The version of record Emad Mohamed, Raheem Sarwar, Linguistic features evaluation for hadith authenticity through automatic machine learning, Digital Scholarship in the Humanities, Volume 37, Issue 3, September 2022, Pages 830–843 is available online at: https://doi.org/10.1093/llc/fqab092
Divisions:	Faculties > Business and Law
URI:	https://mmu-uat.leaf.cosector.com/id/eprint/635194
DOI:	https://doi.org/10.1093/llc/fqab092
ISSN	2055-7671
e-ISSN	2055-768X

Impact and Reach

Statistics

DownloadsShow export options

Activity Overview

6 month trend

45Downloads

6 month trend

22Hits

Additional statistics for this dataset are available via IRStats2.

Altmetric

Repository staff only

Edit record