e-space
Manchester Metropolitan University's Research Repository

    Novel approach for quantitative and qualitative authors research profiling using feature fusion and tree-based learning approach

    Umer, M, Aljrees, T, Ullah, S and Bashir, AK ORCID logoORCID: https://orcid.org/0000-0001-7595-2522 (2023) Novel approach for quantitative and qualitative authors research profiling using feature fusion and tree-based learning approach. PeerJ Computer Science, 9. e1752. ISSN 2167-9843

    [img]
    Preview
    Published Version
    Available under License Creative Commons Attribution.

    Download (6MB) | Preview

    Abstract

    Article citation creates a link between the cited and citing articles and is used as a basis for several parameters like author and journal impact factor, H-index, i10 index, etc., for scientific achievements. Citations also include self-citation which refers to article citation by the author himself. Self-citation is important to evaluate an author’s research profile and has gained popularity recently. Although different criteria are found in the literature regarding appropriate self-citation, self-citation does have a huge impact on a researcher’s scientific profile. This study carries out two cases in this regard. In case 1, the qualitative aspect of the author’s profile is analyzed using hand-crafted feature engineering techniques. The sentiments conveyed through citations are integral in assessing research quality, as they can signify appreciation, critique, or serve as a foundation for further research. Analyzing sentiments within in-text citations remains a formidable challenge, even with the utilization of automated sentiment annotations. For this purpose, this study employs machine learning models using term frequency (TF) and term frequency-inverse document frequency (TF-IDF). Random forest using TF with Synthetic Minority Oversampling Technique (SMOTE) achieved a 0.9727 score of accuracy. Case 2 deals with quantitative analysis and investigates direct and indirect self-citation. In this study, the top 2% of researchers in 2020 is considered as a baseline. For this purpose, the data of the top 25 Pakistani researchers are manually retrieved from this dataset, in addition to the citation information from the Web of Science (WoS). The selfcitation is estimated using the proposed model and results are compared with those obtained from WoS. Experimental results show a substantial difference between the two, as the ratio of self-citation from the proposed approach is higher than WoS. It is observed that the citations from the WoS for authors are overstated. For a comprehensive evaluation of the researcher's profile, both direct and indirect selfcitation must be included.

    Impact and Reach

    Statistics

    Activity Overview
    6 month trend
    42Downloads
    6 month trend
    31Hits

    Additional statistics for this dataset are available via IRStats2.

    Altmetric

    Repository staff only

    Edit record Edit record