Jain, Deepak Kumar, Kumar, Akshi ORCID: https://orcid.org/0000-0003-4263-7168 and Sangwan, Saurabh Raj (2022) TANA: The amalgam neural architecture for sarcasm detection in indian indigenous language combining LSTM and SVM with word-emoji embeddings. Pattern Recognition Letters, 160. pp. 11-18. ISSN 0167-8655
|
Accepted Version
Available under License Creative Commons Attribution Non-commercial No Derivatives. Download (626kB) | Preview |
Abstract
Sentiment analysis is indeed a difficult task owing to the playful language mannerism, altered vocabulary and speak-text used on online forums. Humans tend to use words and phrases in ways that are incomprehensible to those who are not involved in the discourse. Sarcastic remarks in conversations are often utilized to mock others by saying something that isn't pleasant. Sardonic or humorous statements/ tones are used to insult or make others appear puerile. Automated sarcasm detection is considered as one of the key tasks to tweak sentiment analysis and extending it to a morphologically rich and free-order dominant indigenous Indian language Hindi is another challenge. This research puts forward ‘The Amalgam Neural Architecture’, TANA, to detect sarcasm in Hindi tweets. The architecture is trained using two embeddings, namely word and emoji embeddings and combines an LSTM with the loss function of SVM for sarcasm detection. We use the Sarc-H dataset, which is built by scrapping Hindi language tweets and manually annotating based on the hashtags ‘’ (pronounced as kataaksh, which means sarcasm in Hindi) and ‘’ (pronounced as vyangya, another word for sarcasm in Hindi) used by the tweeters and the results are evaluated using various classification performance metrics and achieves a F-score of 0.9675 outperforming LSTM using last layer as softmax as well as the existing works.
Impact and Reach
Statistics
Additional statistics for this dataset are available via IRStats2.